nmkkannan
nmkkannan

Reputation: 1303

Neo4j Bulk Data - Create Relationship [OutOfMemory Exception]

I am using Neo4j Procedure to create relationships on bulk data.

Initially insert that all data using load csv.

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
....

data size is too large[10M] but its successfully executed

my problem is i want to create relationships between this all nodes many-many

but i got exception [OutMemoryException] while executing queries

MATCH(n1:x{REMARKS :"LATEST"}) MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID CREATE (n1)-[:ENROLLER]->(n2) ;

I have already created Indexing and Constraints also

Any idea please help me?

Upvotes: 0

Views: 92

Answers (1)

stdob--
stdob--

Reputation: 29167

The problem is that your query is performed in one transaction, which leads to the exception [OutMemoryException]. And this is a problem, since at this moment the possibility of periodic transactions only have to load the CSV. So, you can, for example, re-read the CSV after first load:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///XXXX.csv" AS row 
MATCH (n1:x{REMARKS :"LATEST", DIST_ID: row.DIST_ID})
WITH n1
MATCH(n2:x{REMARKS :"LATEST"}) WHERE n1.DIST_ID=n2.ENROLLER_ID 
CREATE (n1)-[:ENROLLER]->(n2) ;

Or try the trick with periodic committing from the APOC library:

call apoc.periodic.commit("
    MATCH (n2:x {REMARKS:'Latest'}) WHERE exists(n2.ENROLLER_ID)
    WITH n2 LIMIT {perCommit}
    OPTIONAL MATCH (n1:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    WITH n2, collect(n1) as n1s
    FOREACH(n1 in n1s|
       CREATE (n1)-[:ENROLLER]->(n2)
    )
    REMOVE n2.ENROLLER_ID
    RETURN count(n2)", 
    {perCommit: 1000}
)

P.S. ENROLLER_ID property is used as a flag for selecting nodes for processing. Of course, you can use another flag, which is set in the processing.

Or a more accurate with apoc.periodic.iterate:

CALL apoc.periodic.iterate("
    MATCH (n1:x {REMARKS:'Latest'})
    MATCH (n2:x {REMARKS:'Latest'}) WHERE n1.DIST_ID = n2.ENROLLER_ID
    RETURN n1,n2
  ","
    WITH {n1} as n1, {n2} as n2 
    MERGE (n1)-[:ENROLLER]->(n2)
  ", {batchSize:10000, parallel:true}
)

Upvotes: 1

Related Questions