cypher - load multiple csv files

Question

I have many csv files with names 0_0.csv , 0_1.csv , 0_2.csv , ... , 1_0.csv , 1_1.csv , ... , z_17.csv.

I wanted to know how can I import them in a loop or something ?

Also I wanted to know am I doing it good ? ( each file is 50MB and whole files size is about 100GB )

This is my code :

create index on :name(v)
create index on :value(v)    

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///0_0.txt" AS csv
FIELDTERMINATOR ',' 
MERGE (n:name {v:csv.name}) 
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)

Dave Bennett · Accepted Answer

You could handle multiple files by constructing a file name. Unfortunately this seems to break when using the USING PERIODIC COMMIT query hint so it won't be a good option for you. You could create a script to wrap it up and send the commands to bin/cypher-shell though.

UNWIND ['0','1','z'] as outer
UNWIND range(0,17) as inner
LOAD CSV WITH HEADERS FROM 'file:///'+ outer +'_' + toString(inner) + '.csv' AS csv
FIELDTERMINATOR ',' 
MERGE (n:name {v:csv.name}) 
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)

As far as your actual load query goes. Do you name and value nodes come up multiple times in the files? If they are unique, you would be better off loading the the data in multiple passes. Load the nodes first without the indexes; then add the indexes once the nodes are loaded; and then do the relationships as the last step.

Using CREATE for the :kind relationship will result in multiple relationships even if it is the same value for csv.kind. You might want to use MERGE instead if that is the case.

For 100 GB of data though if you are starting with an empty database and are looking for speed, I would take a look at using bin/neo4j-admin import.

cypher - load multiple csv files

Answers (1)

Related Questions