diegobc11
diegobc11

Reputation: 107

Creating a Graph DB in Neo4j from large CSV file

I am relatively new to neo4j so I apologised if this seems trivial. I am trying to import data from a csv file with quite a large amount of rows, around 2.5 Million rows using the Neo4j desktop app, running it in the Neo4j browser. The contents of the file follow the format:

entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM

I have tried using the query:

LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)

but I get a MemoryPoolOutOfMemoryError after an hour running, so modify my query to run in 'batches' to free up memory while running:

:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS

but the query runs for hours which I dont think is being implemented correctly, So what I need is to be able to store this information in a DB so that I can extract the node embeddings after (i dont need to be able to visualize the graph). Is there a better way to load a large list like this? importing 2.5M records should not take that long to be honest, any help is appreciated.

Upvotes: 0

Views: 346

Answers (1)

cybersam
cybersam

Reputation: 67019

Make sure you have an index or uniqueness constraint (which also creates an index as a by-product) on :Entity(name), to make the MERGEs of your nodes more efficient. For instance:

CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE

Upvotes: 1

Related Questions