Dong
Dong

Reputation: 11

Why there are not so many graph databases as graph processing frameworks?

For graph databases, especially those are active and distributed, I knew some but not a lot. Like orientdb, Titan, Dex, etc.

Regarding the graph processing frameworks, there are huge set of tools like graphx, graph lab, powergraph, xstream, pregel, etc. and there are more coming out every year.

Can any one tell me the difference between those two categories of tools? Are they exchangeable? And why graph databases are not drawing enough attention as graph processing frameworks?

Upvotes: 1

Views: 202

Answers (2)

asap diablo
asap diablo

Reputation: 178

There are connections and isolations between the graph database and graph computing.

Connections: Graph database will not only offer data storage but also a series of graph data processing, For example, to find solve the SSSP problem needs traversal and computation of the graph which must be supported by the graph processing framework.

Isolations: You can't use the graph database for most of the graph computing like PageRank, Greedy Graph Coloring, because as a basic storage and query system, graph database doesn't need to have the ability to do computing jobs.

Correct me if I'm wrong, I'm also a freshman for graph computing.

Upvotes: 0

smolinari
smolinari

Reputation: 701

The difference between graph databases and graph processing frameworks is databases are built to save data in the basic form of a graph, where relationships between the data are built with edges and the data points are built with nodes/vertices. Some databases, like OrientDB extend this basic concept considerably, to make the database much more versatile. Others are less versatile. Though in general, the main goal is to persist the data an a graph-like form, edges and vertices.

With graph processing frameworks, on the other hand, they take a set of data and build analytical graphs out of the data. The goal is mainly analysis of graph like patterns or structures within the data.

I'll try to put this in an analogy, as I understand it.

Say you have a punch bowl full of punch (your data).

In a graph database scenario, the punch is already a graph and you can look into the bowl and see all the stuff in your graph and analyze it too.

With a graph processing framework, you have a punch bowl full of stuff too, but it is murky and you don't see any graphs in it directly. To get a graph of some type, you first have to ladle out some of the punch, in let's say, a "graph processing ladle". This allows you to see some kind of graph coherence, depending on the algorithms you choose to try and analyze the data with. Of course, depending on your machine or system, like Spark, the graph processing ladle could be huge, even just as big as your whole punch bowl or even bigger.

Still, it takes time and processing to make a "sensible graph" out of the punch (your data). The other thing about this is, if you want to store this newly found ladle of analyzed graph punch, you'd have to have another bowl to put it in. And, if you drop the ladle on the floor, your graph data is gone. This wouldn't happen with a graph database.

I hope that makes sense.

Scott

Upvotes: 2

Related Questions