Neo4j, bulk load with Cypher commands

Question

I'm new to Neo4j and there must be something I don't understand about the basics.

I've many objects in Java and I want to use them to populate a Neo4j graph, using the Java driver and Cypher. My code works like this:

// nodes
for ( Person person: persons )
  session.run ( String.format ( 
    "CREATE ( :Person { id: '%s', name: "%s", surname: "%s" })",
    person.getId(), person.getName(), person.getSurname ()
  ));

// relations
session.run ( "CREATE INDEX ON :Person(id)" );

for ( Friendship friendship: friendships )
  session.run ( String.format ( 
    "MATCH ( from:Person { id: '%s' } ), ( to:Person { id: '%s' } )
" +
    "CREATE (from)-:KNOWS->(to)
",
    friendship.getFrom().getId(), 
    friendship.getTo().getId() 
  ));

(indeed, it's slightly more complicated, cause I have a dozen node types and about the same number of relation types).

Now, this is very slow, like more than 1 hour to load 300k nodes and 1M relations (on a fairly fast MacBookPro, with Neo4j taking 12/16GB of RAM).

Am I doing it the wrong way? Should I use the batch inserter instead? (I would prefer to be able to access the graphDB via network). Would I gain something by grouping more insertions into one transaction? (From the documentation, It seems transactions are only useful for rolling back and for isolation needs).

sjc · Accepted Answer

I'm coming from Neo4j in Python, but I think the issue here is with your Cypher commands. I have two suggestions.

It may be faster to Match edges separately. On my primitive benchmark I see a difference of 24ms vs 15ms with this (EDIT: This benchmark is dubious):

MATCH ( from:Person { id: '%s' } )
MATCH ( to:Person { id: '%s' } )
CREATE (from)-:KNOWS->(to)

Another option is to use UNWIND. I use this with the BOLT interface to send fewer transactions but without using the Batch Inserter. Forgive the Python implementation I'm copying here, and hopefully you can look at this along with the Javascript Neo4j Driver docs to convert it.

payload = {"list":[{"a":"Name1","b":"Name2"},{"a":"Name3","b":"Name4"}]}

statement = "UNWIND {list} AS d "
statement += "MATCH (A:Person {name: d.a}) "
statement += "MATCH (B:Person {name: d.b}) " 
statement += "MERGE (A)-[:KNOWS]-(B) "

tx = session.begin_transaction()
tx.run(statement,payload)
tx.commit()

Neo4j, bulk load with Cypher commands

Answers (2)

Related Questions