Neo4j cypher LOAD command with wrong where condition

Question

I have two tables incidents and location.

I have downloaded them to csv files.

I Run the following queries to create nodes:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_location.csv" AS row
CREATE(:tb_location{ latitude: row.latitude,longitude: row.longitude,location_description: 
row.location_description,community_area: row.community_area,block: row.block,district: row.district,ward: row.ward,x_coordinate: row.x_coordinate,y_coordinate: row.y_coordinate,id_city: row.id_city,composite_key: row.composite_key });


USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row
CREATE(:tb_incident{ id: row.id,primary_type: row.primary_type,domestic: row.domestic,date: row.date,description: row.description,arrest: row.arrest,beat: row.beat,year: row.year,updated_on: row.updated_on,latitude : row.latitude,longitude: row.longitude,case_number: row.case_number,composite_foreign_key: row.composite_foreign_key});

Then i create indexes on match attributes:

CREATE INDEX ON :tb_incident(composite_foreign_key);

CREATE INDEX ON :tb_location(composite_key);

Then i try to create relationships:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row1
MATCH(tb_incident:tb_incident{composite_foreign_key: row1.composite_foreign_key})
LOAD CSV WITH HEADERS FROM "file:////path/tb_location.csv" AS row2
MATCH(tb_location:tb_location{composite_key: row2.composite_key})
WHERE tb_incident.composite_foreign_key = tb_location.composite_key
MERGE (tb_incident)-[:occured_at]->(tb_location);

However last query links one incident to all the locations. I am new to cypher and i couldnt locate what i am doing wrong. I intend to link one incident with one location only. Please help if you can help me correct this improper query.

cybersam · Accepted Answer

Since you have already imported all the data, there is no need to use your CSV files any more. And that might be what is causing your issue.

Try this instead:

MATCH (tb_incident:tb_incident), (tb_location:tb_location)
WHERE tb_incident.composite_foreign_key = tb_location.composite_key
MERGE (tb_incident)-[:occured_at]->(tb_location);

That query's complexity is N*M (where N is the number of tb_incident nodes and M is the number of tb_location nodes), so it can take awhile. Unfortunately, neo4j does not use indices when comparing values between nodes.

[UPDATED]

Improving performance by creating relationship during import

When you were importing from the second CSV file, you had the opportunity to not only create each tb_incident node, but also create the corresponding relationship. The MATCH it uses should be able to use the index, since it does not need to compare values between nodes. This means that the complexity of this step is reduced to just N:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row
CREATE(tb_incident:tb_incident{ id: row.id,primary_type: row.primary_type,domestic: row.domestic,date: row.date,description: row.description,arrest: row.arrest,beat: row.beat,year: row.year,updated_on: row.updated_on,latitude : row.latitude,longitude: row.longitude,case_number: row.case_number,composite_foreign_key: row.composite_foreign_key})
WITH tb_incident, row.composite_foreign_key AS cfk
MATCH (tb_location:tb_location{composite_key: cfk})
MERGE (tb_incident)-[:occured_at]->(tb_location);

Neo4j cypher LOAD command with wrong where condition

Answers (1)

Improving performance by creating relationship during import

Related Questions