Reputation: 3
I am trying to import a csv file into Neo4j DB, here is the link for the CSV file http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv
The query that I am using to import the CSV data into DB is
USING PERIODIC COMMIT 10000
LOAD CSV FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:line[9]})
The Query produces the following error:- Cannot merge node using null property value for sourceID
I found some useful resources but given the size of ICIJ Panama Papers Size it will take hours to execute, is there any way to eliminate checking for NULL values and optimize the query?
Upvotes: 0
Views: 1898
Reputation: 36599
You are using the indexing wrong. Index starts from 0, not 1. Also there is a header line in the given csv. So edit the Cypher query accordingly.
Change it to the following:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
MERGE
(i:Intermediateries{name:line[0],internal_id:line[1],address:line[2],valid_until:line[3],country_codes:line[4],countries:line[5],status:line[6],node_id:line[7],sourceID:line[8]})
See the official documentation for more details - http://neo4j.com/docs/developer-manual/current/cypher/clauses/load-csv/#load-csv-import-data-from-a-csv-file
Upvotes: 2
Reputation: 6514
yes you can easily filter out rows with sourceID null
USING PERIODIC COMMIT 10000 LOAD CSV FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
WITH line where line[9] is not null
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],
valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:line[9]})
You can also use coalesce if you want to import those nodes, even though they do not have source ID
USING PERIODIC COMMIT 10000 LOAD CSV FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
WITH line,coalesce(line[9],"NoId") as sourceID
MERGE (i:Intermediateries{name:line[1],internal_id:line[2],address:line[3],
valid_until:line[4],country_codes:line[5],countries:line[6],status:line[7],node_id:line[8],sourceID:sourceID})
I would create a query like this if I imported this if you have apoc plugin
USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
MERGE (i:Intermediateries{internal_id:line.internal_id})
ON CREATE SET i += apoc.map.clean(row.properties,['internal_id'],[])
if you do not have apoc plugin you have to specify properties manually
USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "http://apps.dealopia.com/offshoreleaks/offshore_leaks_csvs-20170104/Intermediaries.csv" AS line
MERGE (i:Intermediateries{internal_id:line.internal_id})
ON CREATE SET i.name = line.name,i.address = line.address, i.valid_until = line.valid_until,
i.country_codes = line.country_code, i.countries = line.countries,i.status = line.status,i.node_id = line.node_id,
i.sourceID = line.sourceID,i.note = line.note
Upvotes: 1