David Fox
David Fox

Reputation: 116

Does relationship creation order effect query performance in Neo4j?

I'm using a batch inserter to create a database with about 1 billion nodes and 10 billion relationships. I've read in multiple places that it is preferable to sort the relationships in order min(from, to) (which I didn't do), but I haven't grasped why this practice is optimal. I originally thought this only aided insertion speed, but when I turned the database on, traversal was very slow. I realize there can be many reasons for that, especially with a database this size, but I want to be able to rule out the way I'm storing relationships.

Main question: does it kill traversal speed to insert relationships in a very "random" order because of where they will be stored on disk? I'm thinking that maybe when it tries to traverse nodes, the relationships are too fragmented. I hope someone can enlighten me about whether this would be the case.

UPDATES:

Some of my system details, if that's needed: - Neo4j 1.9.RC1 - Running on Linux server, 128gb of RAM, 8 core machine, non-SSD HD

Upvotes: 3

Views: 498

Answers (1)

Aditya
Aditya

Reputation: 2246

I have not worked with Neo4J on such a large scale, but as far as i know this won't make much difference in the speed. Could you provide any links which state the order of insertion matters.

What matters in this case if the relations are cached or not. Until the cache is fairly populated, performance will be on the slower side. You should also set an appropriate cache size as soon as the index is created.

You should read this link on regarding neo4j performance.

Read the neo4j documentation on batch insert and these SO questions for help with bulk insert if you haven't already read them.

Upvotes: 1

Related Questions