Reputation: 3004
I'm new to Neo4j and there must be something I don't understand about the basics.
I've many objects in Java and I want to use them to populate a Neo4j graph, using the Java driver and Cypher. My code works like this:
// nodes
for ( Person person: persons )
session.run ( String.format (
"CREATE ( :Person { id: '%s', name: \"%s\", surname: \"%s\" })",
person.getId(), person.getName(), person.getSurname ()
));
// relations
session.run ( "CREATE INDEX ON :Person(id)" );
for ( Friendship friendship: friendships )
session.run ( String.format (
"MATCH ( from:Person { id: '%s' } ), ( to:Person { id: '%s' } )\n" +
"CREATE (from)-:KNOWS->(to)\n",
friendship.getFrom().getId(),
friendship.getTo().getId()
));
(indeed, it's slightly more complicated, cause I have a dozen node types and about the same number of relation types).
Now, this is very slow, like more than 1 hour to load 300k nodes and 1M relations (on a fairly fast MacBookPro, with Neo4j taking 12/16GB of RAM).
Am I doing it the wrong way? Should I use the batch inserter instead? (I would prefer to be able to access the graphDB via network). Would I gain something by grouping more insertions into one transaction? (From the documentation, It seems transactions are only useful for rolling back and for isolation needs).
Upvotes: 3
Views: 1360
Reputation: 3004
I think it's worth to report my experience on this.
I've followed the @sjc suggestion and tried with UNWIND. However, that wasn't so simple, because Cypher doesn't allow you to parameterise node labels or relation types (and I have a dozen labels and relation types). But eventually, I was able to loop over all possible types and send enough items (about 1000) to each UNWIND chunk.
The code using UNWIND is much faster, yet not fast enough, in my opinion (should be OK on a decent PC and with few million nodes, not very good with hundreds of millions of nodes, or more).
The inserter component is much faster (few seconds to upload 1-2 million nodes), although it requires to bring the HTTP access down and I've had a lot of problems with its dependency on Lucene 5.4, because I need to use it inside an application (which produces data) that uses Lucene 6, and awful things happened when I tried to simply swap 5.4 with 6 in the classpath. I've read that there is some mechanism to make this possible, but it doesn't seem easy and certainly isn't so well documented.
I definitely didn't expect all such troubles for executing such a basic operation efficiently.
Upvotes: 1
Reputation: 1137
I'm coming from Neo4j in Python, but I think the issue here is with your Cypher commands. I have two suggestions.
It may be faster to Match edges separately. On my primitive benchmark I see a difference of 24ms vs 15ms with this (EDIT: This benchmark is dubious):
MATCH ( from:Person { id: '%s' } )
MATCH ( to:Person { id: '%s' } )
CREATE (from)-:KNOWS->(to)
Another option is to use UNWIND. I use this with the BOLT interface to send fewer transactions but without using the Batch Inserter. Forgive the Python implementation I'm copying here, and hopefully you can look at this along with the Javascript Neo4j Driver docs to convert it.
payload = {"list":[{"a":"Name1","b":"Name2"},{"a":"Name3","b":"Name4"}]}
statement = "UNWIND {list} AS d "
statement += "MATCH (A:Person {name: d.a}) "
statement += "MATCH (B:Person {name: d.b}) "
statement += "MERGE (A)-[:KNOWS]-(B) "
tx = session.begin_transaction()
tx.run(statement,payload)
tx.commit()
Upvotes: 1