Hariharan Thiagarajan
Hariharan Thiagarajan

Reputation: 11

Neo4j Huge database query performance configuration

I am new to Neo4j and graph databases. Saying that, I have around 40000 independent graphs uploaded into a neo4j database using Batch insertion, so far everything went well. My current database folder size is 180Gb, the problem is querying, which is too slow. Just to count number of nodes, it takes forever. I am using a server with 1TB ram and 40 cores, therefore I would like to load the entire database into memory and perform queries on it.

I have looked into the configurations but not sure what changes I should make to cache the entire database into memory. So please suggest me the properties I should modify.

I also noticed that most of the time Neo4j is using only one or two cores, How can I increase it?

I am using the free version for a university research project therefore I am unable to use High-Performance Cache is there an alternative in free version?


My Solution: I added more graphs to my database and now my database size is 400GB with more than a billion nodes. I took Stefan's comments and used java APIs to access my database and moved my database to RAM disk. It takes to 3 hours to walk through all the nodes and collect information from each node.

RAM disk and Java APIs gave a big boost in performance.

Upvotes: 1

Views: 1204

Answers (2)

Michael Hunger
Michael Hunger

Reputation: 41676

What Neo4j version are you using?

Please share your current config (conf/* and data/graph.db/messages.log) you can use the personal edition of Neo4j enterprise.

What kinds of use cases do you want to run?

Counting all nodes is probably not your main operation (there are ways in the Java API that make it faster).

For efficient multi-core usage, run multiple clients or write java-code that utilizes more cores during traversal with ThreadPools.

Upvotes: 0

Stefan Armbruster
Stefan Armbruster

Reputation: 39915

Counting nodes in a graph is a global operation that obviously needs to touch each and every node. If caches are not populated (or not configured according to your dataset) the drive of your hard disc is the most influencing factor.

To speed up things, be sure to have caches configured efficiently, see http://neo4j.com/docs/stable/configuration-caches.html.

With current versions of Neo4j, a Cypher query traverses the graph in single threaded mode. Since most graph applications out there are concurrently used by multiple users, this model saturates the available cores.

If you want to run a single query multithreaded, you need to use Java API.

In general Neo4j community edition has some limitation in scaling for more than 4 cores (due to a more performant lock manager implementation in Enterprise edition). Also the HPC (high performance cache) in Enterprise edition reduces the impact of full garbage collections significantly.

Upvotes: 0

Related Questions