Reputation: 3734
With key-value, document, and column-family databases, I understand you can scale out with combinations of replication and sharding in the keyspace. But, with common graph operations like shortest path, etc. -- these don't really seem to gain any benefit from replication...and I can't see how you would shard a graph database without finding an independent subgraph (very difficult).
Are there graph databases that try to tackle this problem? What is the current research in this area?
Upvotes: 21
Views: 7148
Reputation: 1
ArangoDB is a multi-model graph database which scales horizontally like a document store also for graphs. It follows the hybrid-index approach to graphs.
With the SmartGraph feature, one can shard a graph dataset by a user defined sharding key (e.g. region, customer, category or any other property) and vertices as well as their edges get distributed to the same machine. The query engine then knows where the data needed for a given query resides, sends the request to the needed machines and executes the query locally. For many scale-out use cases, this can be a suitable solution .https://www.arangodb.com/why-arangodb/arangodb-enterprise/arangodb-enterprise-smart-graphs/
Upvotes: 0
Reputation: 11
Check out http://thinkaurelius.com/
For Titan they use Cassandra, HBase, or BerkeleyDB as the backing store which comes inherently with the store's scalability characteristics.
Upvotes: 0
Reputation: 8885
GoldenOrb was a concept that aimed to create a horizontally scalable Graph Database. It was released as open source, but the project appears to be dead now (link to GitHub is offline). It was based on Hadoop.
Even though, such model cannot be considered yet a fully capable graph database, given the amount of information that needs to be shared among nodes is too large and complex for certain use cases of graph databases. Evolution of computing, tiered caching layered architectures, could allow it to be considered fully scalable and a de-fact Graph database.
So overall, the answer to this date is "no", not fully.
The original web site hosting the project is this: http://goldenorbos.org
Upvotes: 1
Reputation: 2670
Neo4j supports sharding and is trying to tackle with sharding problems. Please take a look at http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx
Upvotes: 3
Reputation: 16174
Replication can be useful for any kind of database - it's just creating multiple copies of data so you can serve more queries than a single server can handle.
Sharding is a little more complex, but actually not too different from key/value or document stores, because internally edges have to be represented as simple lists.
While it is true that finding independent subgraphs is impossible in most cases, it isn't actually necessary. As long as the node processing the query is able to get data from other nodes, having the data available locally is just a performance optimisation.
Once you have that set up, you have a lot of options for optimising performance based on the type of graph you are working with - for example in a social graph you might use location to select the node for a user because you know that most connections are local.
I'm not aware of any existing graph databases that have sharding built in, probably because the problem is much harder to solve for the general case and the small size of edge data means you need a really large graph to exceed the capacity of a single server.
Upvotes: 8