The New Database Model, Venrock & Uncorrelated’s investment into Dgraph
The Database Shift
Thanks to the cloud, the amount of data being generated and stored has exploded in scale and volume. Every aspect of the enterprise is being instrumented for data so new operations are built based on that data. Pushing every company to become a data company.
One of the most profound and maybe non-obvious shifts driving this is the emergence of the cloud database. Services such as Amazon S3, Google BigQuery, Snowflake, Databricks, and others, have solved computing on large volumes of data, and have made it easy to store data from every available source. Enterprise wants to store everything they can in the hopes of being able to deliver improved customer experiences and new market capabilities.
The Database Explosion
This rapid growth in enterprise data has led to a new scale and performance needs. As a result, new database engines are emerging to meet specific demands of availability, horizontal scale, and latency based on the type of data it’s storing and serving.
Database companies have raised over $8.7B over the last 10 years, with almost half of that, $4.1B, just in the last 24 months, up from $849M in 2019 (according to CB Insights).
There are now different databases for analytics, transactional data, graph, or time series. Also for data used for cache, search, based on indexes, events, and more. Systems specifically to power ad-hoc analytics, or for data that is unstructured, semi-structured, columnar, or row-based. Different systems for fast reads, and fast writes.
Each comes with different performance parameters around latency, throughput, high availability, horizontal scale, distributed consistency, failover protection, partition tolerance, serverless, and fully managed.
As a result, enterprises on average store data across seven or more different databases, i.e., Snowflake as your data warehouse, Clickhouse for ad-hoc analytics, Timescale for time series data, Elastic for their search data, S3 for logs, Postgres for transactions, Redis for caching or feature stores, Cassandra for complex workloads. That’s all assuming you are collocated to a single cloud, and that you’ve built a modern data stack from scratch.
Mo’ Databases, Mo’ problems
But no one wants to introduce yet another new database into their architecture to solve only part of the problem.
The cost of every new database introduced into an organization is an N² problem to the number of databases you already have. Migrating to a new architecture, schema, configuration, and needing to re-optimize for rebalancing, query planning, scaling, resource requirements, and more is often a [value/(time+cost)] of close to zero. It may come as a surprise, but there are still billions of dollars in Oracle instances still powering critical apps today, and they likely aren’t going anywhere.
In reality, what enterprises and developers are trying to solve are underlying fundamental problems, partly as a result of the explosion in data scale, and partly as a need to declutter and streamline their existing data store architectures.
This is further reinforced as the days and roles of the DBA have quickly been replaced by application and data engineers who need their systems and services to just work out of the box. No one wants to think about resources, connection pooling, cache logic, vacuuming, query planning, or updating indexes. Teams today want a robust set of systems, services, and endpoints that are easy to deploy, and just work.
Future of the Database
We’ve long believed that the future of databases, with the vast majority powering applications and operational needs, would look a lot more like software; with end states and expressive goals. Taking all the work a DBA would focus their weeks and months on, and moving it into software. This meant it would:
- Scale right out of the box
- Allow for flexible schemas and data structures
- Decouple the data schema from the API
- Built sharding and replication logic into the core
- Removed the trade-off between fast read vs fast write
- Could support increasingly complex queries without substantial latency trade-offs
- Could support high availability with a single click
- Be more than just a database to solve the application backend problem.
We didn’t believe any of the new database engines emerging were moving in this direction. Instead, they were all incremental approaches from the previous.
That’s until we met Dgraph.
Salil and Ethan met Dgraph some years back.
Salil first met Dgraph in 2016. Graph databases that existed at the time were not truly distributed: they ran fine on one node but relied on a variety of architectural hacks to run on multiple nodes, and were thus not truly scalable. Either you found horizontally scalable NoSQL databases that had “graph overlays” but were not built ground-up to be graph databases, so graph-like queries did not perform well; or there were databases that were indeed graph databases but were not architected to be horizontally scalable.
Ethan first met the original Dgraph team almost two years ago. Prior to joining Venrock, he had built multiple distributed systems that attempted to use graph databases (i.e. Neo4J). But none would scale. It was immediately clear to him that the Dgraph founders were world-class technical leaders with deep experience in distributed systems and building graph systems at scale. They were able to see around the corner where the market was going. They realized the need to build a better graph database and platform that could serve both to enable emerging use cases that existing architectures weren’t designed to support and could be used as an entire application backend. Putting together the ability for developers to develop applications with their entire backend in a single place, database, and service layer within a single solution.
Since its initial open-source release in 2016, Dgraph is now the most popular open-source Graph database on GitHub with over 18K stars, 15M downloads, and a passionate community.
The team built the platform to be used both as a distributed, graph database engine and as a highly flexible backend, that was native to GraphQL, an open-source data query language developed at Facebook, to make APIs fast, flexible, and developer-friendly. As an alternative to REST, GraphQL lets developers construct requests that pull data from multiple data sources in a single API call. While a very powerful query language, most underlying data are stored in relational or document-based databases, designed specifically for SQL, not GraphQL. Thus never able to leverage the full power of GraphQL.
The native support of GraphQL in Dgraph along with the underlying graph engine could allow developers to store data in various, flexible ways, no longer needing to update APIs as the data model or schema changes, and could change the schemas anytime they wanted without having to change the data models. Allowing developers to use Dgraph as their entire backends rather than just their database.
At the same time, the core graph database engine was lightyears ahead of its time. Built as a distributed architecture modeled after Google Spanner, was designed as a general-purpose database, rather than just for graph analytics. Able to horizontally scale, with high availability built-in, lightning-fast query execution, and real throughput, meaning queries would continue to perform even as traffic or concurrency increases.
The Dgraph Model
Fast forward, Salil brought on Akon Dey Ph.D, the newly joined CEO & VP Engineering, who spent over two decades building large scale database systems at companies including Yahoo! Inc., and Visa, and a Ph.D. in distributed database systems, and Gajanan Chinchwadkar Ph.D, an industry veteran with more than 30 years of experience in distributed database systems and multi-modal query engines.
Both saw the power of the underlying Dgraph technology as a general-purpose database that offered:
- Vertical and horizontal scale of the box thanks to its distributed architecture based on google spanner, with the ability to use the same query everywhere as if querying a single database.
- Ultimate schema flexibility regardless of how the data distribution may change over time. No more thinking about ‘tables’.
- Automatic sharding without the N+1 and network broadcast problems when running a query in high fanout scenarios.
- Synchronous replication across all replicas
- High-performance latency and throughput without needing to make the traditional trade-offs
- A native GraphQL interface that could offer more than just a database, but solve the application backend problem.
Supporting the Community
And we at Venrock & Uncorrelated, along with Akon and Gajanan, understood the importance of the open-source community. A passionate and energetic community that believed in the Dgraph vision, wanted to be involved and deserved renewed stewardship to take the project forward.
When joining, both Akon and Gajanan were committed to earning back the moral authority to be the stewards of the project, re-engage the community to actively contribute and help lead the roadmap, along with a commitment to ensuring the open core continued to be supported and funded. Ensuring the open core would continue to evolve and grow in functionality, capability, and contributorship, along with a clear roadmap that would graduate potential features that might be designed for enterprises to the open core. Whether you deploy the open-source or enterprise version of Dgraph, it would be a world-class developer experience, delivering unparalleled performance and capabilities.
We’ve both long said that the future of the database just needs to work. Scale without firefighting. Abstract out the complexities of schemas, tables, and data structure. And handle whatever is thrown at it. Whether you are a front-end developer building an application. Or a large enterprise trying to solve fraud detection. We believe Dgraph will be the reference architecture for the modern enterprise.
Both Salil and Ethan are excited to be able to support Akon, Gajanan, and the Dgraph community in delivering on this promise, and bringing a new kind of database to the community.