Sai Sri Krishna Kotha
Sai Sri Krishna Kotha

Reputation: 113

What is the time complexity of search query in Graph database?

What is the time complexity of search query in Graph database (especially Neo4j) ?

I'm having relational data with me. I'm confused to use a Relational database or Graph database to store that data. So, I want to store the data based on the performance and time complexity of the queries for that particular database. But, I'm unable to find the performance and time complexity of the queries for Graph database.

Can anyone help me out ?

Upvotes: 3

Views: 3690

Answers (3)

InverseFalcon
InverseFalcon

Reputation: 30397

The answer isn't so simple because the time complexities typically depend upon what you're doing in the query (the results of the query planner), there isn't a one-size-fits-all time complexity for all queries.

I can speak some for Neo4j (disclaimer: I am a Neo4j employee).

I won't say much on Lucene index lookups in Neo4j, as these are typically performed only to find starting nodes by index, and represent a fraction of execution time of a query. Relationship traversals tend to be where the real differences show themselves.

Once starting nodes are found, Neo4j walks the graph through relationship traversal, which for Neo4j is basically pointer chasing through memory, which tends to be where native graph dbs outperform relational dbs: The cost of chasing pointers is constant per traversal.

For relational dbs (including graph layers built on top of relational dbs), relationship traversal is usually accomplished by various table join algorithms, which have their own time complexity, typically O(log n) when the join is backed by a B-tree index. This can be quite efficient, but we are in the age of big data and data lakes, so data is getting larger, and the efficiency of the join does grow based on the data in the tables being joined. And this is fine for smaller numbers of joins, but there are some queries that require many joins (sometimes we won't have an upper bound on when to stop joining), and we may want to traverse through many kinds of tables/nodes (and sometimes we may not care what kind they are), so we may need the capability to join to or through any table arbitrarily, which isn't easily expressed or performed in a relational db.

This makes native graph databases like Neo4j well-positioned for handling queries over connected data, especially with a non-trivial or growing number of relationship traversals (or if the traversals are unbounded, such as for reachabilty queries, shortest-path, and others). The cost for queries is associated with the part of the graph touched or walked by the query, not the total number of nodes in the database, so it helps when you can adequately constrain the query to touch the smallest possible subgraph in the db.

As far as your question of whether to use a relational or graph database, that depends upon the connectedness of your data and the queries you plan to run.

If you do decide upon a graph database, then you have choices here as well, and a different set of criteria, such as native vs non-native implementation (Neo4j is a native graph database and takes advantage of index-free adjacency for relationship traversals), whether you need ACID (Neo4 is an ACID database), and if a rich and expressive query language is desired (Cypher is Neo4j's query language, feel free to learn and compare against others).

For more in-depth info, here's a DZone article on Why Graph Databases Outperfrom RDBMS on Connected Data, and an article on Explaining Graph Databases to a Developer by Neo4j's Chief Scientist Jim Webber.

Upvotes: 7

Oleg Gritsak
Oleg Gritsak

Reputation: 622

Actually, the most probable scenario is to use both Neo4j and some DBMS (relational or nosql like Mongo). Because it is too hard to store all dataset in Neo4j.

Speed-wise traversing nodes in DBMS is 10-100++ times slower than Neo4j. Also Neo4j has built-in shortestPath (and many other) methods.

Also, can mention hybrid solutions, like ArangoDB. It has graph engine + document-based engine. But under the hood it is two separate tables, so it is almost as inconvenient as Neo4j+DBMS.

Upvotes: 2

techie95
techie95

Reputation: 515

It would actually depend upon the size of your data and the complexity.

In Graph databases like neo4j the time complexity is depends on the kind of query and on the planners(executors) behind the query. Graph database perticularly Neo4j performs easier JOINS which give us a clear view of data.

For more information please visit this reference blog by Neo4j. And you can also refer to this question as it is similar to yours.

Hope this helps!

Upvotes: 0

Related Questions