Reputation: 9
I'm using the code below to import the data, but it takes a very long time, knowing that the size of the data is not large only (3.3 megabytes). Is it possible to modify the code or use another method to speed up the data import process?
code:
LOAD CSV WITH HEADERS FROM 'file:///Musae-Github.csv' as line
WITH toInteger(line.source) AS Source, toInteger(line.destination) AS Destination
MERGE (a:person {name:Source})
MERGE (b:person {name:Destination})
MERGE (a)-[:Freind ]-(b)
RETURN *
Upvotes: 0
Views: 480
Reputation: 12684
You can use transaction batching to speed up importing csv files to Neo4j. Also, ensure that you have index on person.name
CREATE INDEX PersonNameIndex IF NOT EXISTS FOR (p:Person) ON (p.name)
Below is the updated script:
CALL apoc.periodic.iterate('
CALL apoc.load.csv('file:///Musae-Github.csv') yield map as line return line
','
WITH toInteger(line.source) AS Source, toInteger(line.destination) AS Destination MERGE (a:Person {name:Source}) MERGE (b:Person {name:Destination}) MERGE (a)-[:Friend ]-(b)
', {batchSize:10000, iterateList:true, parallel:true});
reference: https://neo4j.com/labs/apoc/4.3/import/load-csv/#_transaction_batching
Upvotes: 1
Reputation: 655
Your code looks great, although I don't know if you need to RETURN * at the end.
Try seeing how long it takes just to process the file:
LOAD CSV FROM "yourpathhere" as line
return linenumber(), datetime(), line
This should be very fast.
Upvotes: 0