It’s (Still) All About the Data

I’ve been doing infrastructure for the past 15 years in many different roles – user, architect, partner, OEM pre-sales, etc across Windows, Linux, Servers, Virtualization, Storage, Backup, BC/DR and Hyperconverged Infrastructure. It was fun, fast moving and was solving a lot of fun problems. However, recently I’d found myself wanting to solve other problems that infrastructure just wasn’t or couldn’t. I was also looking for a whole sale change to really push me out of my comfort zone, to mix things up, while still staying in technology and getting in pretty early into a start-up. And honestly, I love change. Big change. I was pretty comfortable with the technology sphere I was working in and could have stayed on that path, which meant I should probably not :)

As we used to say at SimpliVity, “It’s all about the data”. And we were right. But, the manner in which we talked about ‘the data’ was block based. As I’ve been seeing the changes in how business run, expand and learn from their data, I’ve seen a lot of challenges that couldn’t be addressed with just housing and moving (or not moving, in the case of SimpliVity’s DVP) the blocks of data. This year’s X is faster, bigger, cheaper than last year’s. I wanted to solve challenges with the data itself. How do we access, connect, use, protect, govern, and free the data in a world of lakes and pools (data in MSSQL, NoSQL, Parquet files in S3 buckets, Hadoop, etc.), GDPR, Cambridge Analytica, PIPL, and a whole host of data regulations. If data truly is the ‘new oil’, that’s where I wanted to be.

Enter a conversation with Immuta, Inc, a company built out of data management in the Intelligence Community, specifically as it relates to access for data scientists for analytics and ML and its intersection of highly regulated data. How can analytic models be built on data that’s so heavily gated without getting in the way of time to value? How can we guarantee privacy with specific access intent? How can this be done no matter the data source? Great questions! Some I was thinking of, many I was not. Connectedness and governance, for Data Scientists, for regulated data, for ML. Now, the big question, how can we do this without copy data management hell? Without copying, and recopying and deidentifying, and copying again, then data for the model changes and the process repeats. How can we grant time and intent based 3rd party access to our data and guarantee auditable privacy?

Immuta provides a common read-only data access interface. This is geared towards repeatable model training without requiring copy data management. And without having data science teams re-learn a new tool. They can use tools they already have for modeling and intelligence (e.g. Spark, Datarobot, Domino, H2O, Tableau, PostgreSQL, etc.) and join across them for cross data analytics. Data scientists get personalized access to data based on projects and subscriptions. Data owners get to grant self-service access to all their data types and services.

Then there is this amazingly rich policy engine for data governors and compliance teams, allowing them to create condition-based rules for data access through natural language filters. Data is hidden, masked, redacted, and anonymized on-the-fly in the control plane based on the attributes of the user accessing the data and the purpose under which they are acting.  All of this done with granular audit controls and reporting (see the exact query by whom, when, under what access rule, etc.)

I’ve taken a role with Immuta as a Sr. Solutions Architect, tasked with helping build out their first Go to Market team. This decision didn’t come easy, as it means, mostly, farewell to the majority of my interactions with the virtualization and storage worlds. Of course, I’ve made some of my best friends there, so I’m not going away :) But it means less VMUGs, likely my last vExpert was 2018 and the like. Excited and a bit nervous of the degree of change ahead, but boy, am I looking forward to it.

Let’s Simplify Things

Over the past few decades, many technologies have come into the datacenter to increase both the quality and number of services. Some of these have been to solve direct technical problems (WAN acceleration), some have been to solve capacity and utilization problems (deduplication, virtualization, SAN/NAS), and some to solve ‘keep the business alive’ problems (backup, BC/DR, etc.). With each of these solutions came another box to consume more space, more power, more cooling, more money, and on and on. On top of the physical ‘mores’ is the ‘complexity more’. More software. More management consoles. More throats. Though, these silos of infrastructure did bring less of a few things: less productivity, less agility, less flexibility, etc.

Enter convergence. At the base of it, convergence is about simplicity and efficiency.  As Stu Miniman said, “Customers, however, are not looking to buy “convergence””, they’re looking to solve the problems of complexity, inefficiency and inflexibility. And not just solve them for one application, silo or data set, but to solve them at scale.

A number of vendors come straight to mind when talking about convergence, but one thing that’s certain is that this a change in how the business of the datacenter is done. As we know, change is not always met with open arms within an industry. As Andre Leibovici pointed out in his linking to an article about a change in an industry: “disruption will always be challenged by standards”

Traditionally, datacenter projects include deploying a storage product, a backup product, a replication product, a compute product and a virtualization product, often taking a number of days to a number of weeks to implement. Enough. Let’s change this. Let’s simplify.

For this reason, I’m excited to join an amazing team (Gabriel Chapman, Ron Singler, Dave Robertson, Mario Brum, etc.) at SimpliVity. Their take on hyperconvergence is, I feel, the most complete to date and their efficiency and simplicity of collapsing silos of datacenter technologies is refreshing. I’m excited about integrating more than just storage, compute and a hypervisor in a platform, but also fine grained policy based backup, deduplication, compression, wan optimization, replication and more in a building block architecture with one management console.

I’m extremely grateful to my friends at Computex for the opportunity to work with them helping customers along the virtualization path and I look forward to continuing conversations about simplifying datacenters. Let’s do it.

Ben-Stiller-Is-Retro-Do-It-Reaction-Gif

TL:DR – I’m joining SimpliVity: yay :-)

Winds of Change

Way back in 2012 (remember that year? Yeah, those were the days), I was invited to join a great team at Nexenta Systems. On this team, one of my primary roles was to manage the internal deployment of VMware’s vCloud Director. The use case was pretty typical: carve up resources for Engineering, Sales, Support, Training, etc. for them to each have their own datacenter playground. It was really fun to train, watch and work with each group use the resources to develop their vApp catalogs and see some lightbulbs go on for what’s possible for Infrastructure as a Service: from the Training group developing full on, repeatable, deployable in minutes labs for students running nested instances of both NexentaStor and vSphere to Support spinning up analytics machines to process core dumps or test a process/change set quickly only to destroy them minutes later. Continue reading