Alex
Alex

Reputation: 1081

Out of memory when creating large number of relationships

I'm new to Neo4J, and I want to try it on some data I've exported from MySQL. I've got the community edition running with neo4j console, and I'm entering commands using the neo4j-shell command line client.

I have 2 CSV files, that I use to create 2 types of node, as follows:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/updates.csv" AS row
CREATE (:Update {update_id: row.id, update_type: row.update_type, customer_name: row.customer_name, .... });

CREATE INDEX ON :Update(update_id);

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:/tmp/facts.csv" AS row
CREATE (:Fact {update_id: row.update_id, status: row.status, ..... }); 

CREATE INDEX ON :Fact(update_id);

This gives me approx 650,000 Update nodes, and 21,000,000 Fact nodes.

Once the indexes are online, I try to create relationships between the nodes, as follows:

MATCH (a:Update)
WITH a
MATCH (b:Fact{update_id:a.update_id})
CREATE (b)-[:FROM]->(a)

This fails with an OutOfMemoryError. I believe this is because Neo4J does not commit the transaction until it completes, keeping it in memory.

What can I do to prevent this? I have read about USING PERIODIC COMMIT but it appears this is only useful when reading the CSV, as it doesn't work in my case:

neo4j-sh (?)$ USING PERIODIC COMMIT
> MATCH (a:Update)
> WITH a
> MATCH (b:Fact{update_id:a.update_id})
> CREATE (b)-[:FROM]->(a);
QueryExecutionKernelException: Invalid input 'M': expected whitespace, comment, an integer or LoadCSVQuery (line 2, column 1 (offset: 22))
"MATCH (a:Update)"
 ^

Is it possible to create relationships in this way, between large numbers of existing nodes, or do I need to take a different approach?

Upvotes: 3

Views: 533

Answers (1)

Christophe Willemsen
Christophe Willemsen

Reputation: 20185

The Out of Memory Exception is normal as it will try to commit it all at once and as you didn't provide it, I assume java heap settings are set as default (512m).

You can however, batch the process with kind of pagination, only I would prefer to use MERGE rather than CREATE in this case :

MATCH (a:Update)
WITH a
SKIP 0
LIMIT 50000
MATCH (b:Fact{update_id:a.update_id})
MERGE (b)-[:FROM]->(a)

Modify SKIP and LIMIT after each batch until your reach 650k update nodes.

Upvotes: 7

Related Questions