Reputation: 6413
I was wondering how databases like Dgraph and TigerGraph managed to shard the graph in-order to support horizontal scaling without breaking the connections between nodes besides supports a lot of interesting algorithms.
And they claim to be a native graph solution so an approach like facebook or twitter for example is not the case here.
The only solution that come to my mind is by spreading the graph among so many small databases, which leads to so many nodes duplication to maintain the relationships.
Any ideas ?
Thanks in advance
Upvotes: 7
Views: 2347
Reputation: 41
This is a critical question, and a weakness of large graph databases. Most of them use read-only replicas or do have issues with too many hops across a network.
But you asked about Dgraph which is actually completely different and does not break a graph up into disconnected or even overlapping sub-graphs on different servers. Instead, it stores entire predicates on individual servers, and this allows a given query to execute in a small number of network hops.
See the "developers in sf who use vim" example here: https://dgraph.io/blog/post/db-sharding/ .
Upvotes: 4
Reputation: 168
So technically there are two principles to follow regarding graph sharding. The first one is Edge-Cut which cuts an edge into two parts (incoming and outgoing) and stores them in different servers respectively. Each vertex associated with the edge is distributed to a specific server in the cluster. Nebula Graph, a distributed graph database, follows this method. The second one is Vertex-Cut, which cuts a vertex into N parts (depending on how many edges the vertex has) and stores them in different servers. Each edge associated with the vertex is then distributed to a specific server in the cluster. GraphX did it this way.
However, graph sharding is an NP problem anyway, which is way much harder than sharding in SQL. So some vendors might do it differently than Cut-Edge only or Cut-Vertex only. For example, your thought, i.e. spreading subgraph, is somewhat like Neo4j Fabric. Some vendors place the whole graph structure, not including the properties, into a single host memory so that fetching subgraphs is very fast. While some vendors adopt adjacency list to separate nodes and edges in the graph, without considering too much for the locality.
Upvotes: 7