13 Problem Areas in Data Infrastructure to build NewCos

Sharing the problem areas and opportunities that could see large, rapid growth new challengers emerge

A year ago, we decided to start open-sourcing our thinking around problems areas of interest, starting with 15 Problem Spaces in Developer Tools & Infrastructure We’re Excited About at Venrock. With the hope that it would spark conversations and bring teams together to go after exciting opportunities. With so much that has changed in the world, we decided to follow up almost a year later, with our next iteration.

The last 12 months have been a technology tipping point for businesses in the wake of remote work, the increased need to leverage data for faster decision making, increased pressures of moving workloads to the cloud, and the realizations of technology investments as a competitive advantage in a digital world. The digital transformation we’ve seen play out over the last few years just compressed the next five years of progress into one.

Companies such as Astronomer*, DBT, Decodable*, Imply, Materialize, Superconductive, and more — built on core open source projects — have seen meteoric rises as a result of increased focus on data engineering and delivering business value through unlocking data velocity.

Private valuations soared for companies such as Databricks, Confluent, and Datarobot; ushering massive infrastructure transitions from the likes of legacy incumbents Cloudera, Talend, Oracle, and Informatica as they modernize their enterprise capabilities.

Public companies such as Snowflake, Mongo, Cloudflare*, and Twilio are seeing historic 20x — 60x EV/revenue multiples as ‘digital transformation’ shifts into second gear. With a concerted focus on modernizing the infrastructure and data planes in order to unlock data as a competitive edge and reduce operational overhead in order to move faster. We’ve previously written about this as the evolution to an ‘everything as code’ model and the era of the programmatic infrastructure.

While a lot has changed in the last 12 months, much also remains the same, with continued opportunities to evolve how organizations build services, deploy infrastructure, distribute resources, increase data velocity, secure applications, and begin to leverage machine learning for workload-specific optimizations.

If the 2010s represented a renaissance for what we can build and deliver, the 2020s have begun to clearly represent a shift to how we build and deliver, with a focused intensity on infrastructure, data, and operational productivity.

As we look forward over the next 12–24 months, here are (13) more problem areas we’ve been excited about. If any of them resonate with you, or if you have comments/thoughts, please reach out!

  1. Persistence layer replication is still an unsolved problem in true multi-cloud deployments. The evolution of the multi-cloud is allowing applications to become more cloud-agnostic. You can deploy now wherever there is capacity or specialized services available. While you can elastically scale up your application servers, there is no way to auto-scale your persistence layer. As soon as you talk about disk storage, cross DC communication latency becomes untenable. The bigger your persistence layer footprint is, the more sharded your data becomes, the more replication becomes an architecturally limiting problem.

*Venrock portfolio company

Venture Capitalist, Partner @Venrock, writing about software & hard things for developers, space, and modern computing.