Reputation: 431
I have two tables incidents and location.
I have downloaded them to csv files.
I Run the following queries to create nodes:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_location.csv" AS row
CREATE(:tb_location{ latitude: row.latitude,longitude: row.longitude,location_description:
row.location_description,community_area: row.community_area,block: row.block,district: row.district,ward: row.ward,x_coordinate: row.x_coordinate,y_coordinate: row.y_coordinate,id_city: row.id_city,composite_key: row.composite_key });
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row
CREATE(:tb_incident{ id: row.id,primary_type: row.primary_type,domestic: row.domestic,date: row.date,description: row.description,arrest: row.arrest,beat: row.beat,year: row.year,updated_on: row.updated_on,latitude : row.latitude,longitude: row.longitude,case_number: row.case_number,composite_foreign_key: row.composite_foreign_key});
Then i create indexes on match attributes:
CREATE INDEX ON :tb_incident(composite_foreign_key);
CREATE INDEX ON :tb_location(composite_key);
Then i try to create relationships:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row1
MATCH(tb_incident:tb_incident{composite_foreign_key: row1.composite_foreign_key})
LOAD CSV WITH HEADERS FROM "file:////path/tb_location.csv" AS row2
MATCH(tb_location:tb_location{composite_key: row2.composite_key})
WHERE tb_incident.composite_foreign_key = tb_location.composite_key
MERGE (tb_incident)-[:occured_at]->(tb_location);
However last query links one incident to all the locations. I am new to cypher and i couldnt locate what i am doing wrong. I intend to link one incident with one location only. Please help if you can help me correct this improper query.
Upvotes: 0
Views: 238
Reputation: 67009
Since you have already imported all the data, there is no need to use your CSV files any more. And that might be what is causing your issue.
Try this instead:
MATCH (tb_incident:tb_incident), (tb_location:tb_location)
WHERE tb_incident.composite_foreign_key = tb_location.composite_key
MERGE (tb_incident)-[:occured_at]->(tb_location);
That query's complexity is N*M (where N is the number of tb_incident
nodes and M is the number of tb_location
nodes), so it can take awhile. Unfortunately, neo4j does not use indices when comparing values between nodes.
[UPDATED]
When you were importing from the second CSV file, you had the opportunity to not only create each tb_incident
node, but also create the corresponding relationship. The MATCH
it uses should be able to use the index, since it does not need to compare values between nodes. This means that the complexity of this step is reduced to just N:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:////path/tb_incident.csv" AS row
CREATE(tb_incident:tb_incident{ id: row.id,primary_type: row.primary_type,domestic: row.domestic,date: row.date,description: row.description,arrest: row.arrest,beat: row.beat,year: row.year,updated_on: row.updated_on,latitude : row.latitude,longitude: row.longitude,case_number: row.case_number,composite_foreign_key: row.composite_foreign_key})
WITH tb_incident, row.composite_foreign_key AS cfk
MATCH (tb_location:tb_location{composite_key: cfk})
MERGE (tb_incident)-[:occured_at]->(tb_location);
Upvotes: 1