Reputation: 107
I am relatively new to neo4j so I apologised if this seems trivial. I am trying to import data from a csv file with quite a large amount of rows, around 2.5 Million rows using the Neo4j desktop app, running it in the Neo4j browser. The contents of the file follow the format:
entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM
I have tried using the query:
LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
but I get a MemoryPoolOutOfMemoryError after an hour running, so modify my query to run in 'batches' to free up memory while running:
:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS
but the query runs for hours which I dont think is being implemented correctly, So what I need is to be able to store this information in a DB so that I can extract the node embeddings after (i dont need to be able to visualize the graph). Is there a better way to load a large list like this? importing 2.5M records should not take that long to be honest, any help is appreciated.
Upvotes: 0
Views: 346
Reputation: 67019
Make sure you have an index or uniqueness constraint (which also creates an index as a by-product) on :Entity(name)
, to make the MERGE
s of your nodes more efficient. For instance:
CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE
Upvotes: 1