Reputation: 183
Neo4j Version 2.2.4
I use LOAD CSV to import a huge collection of nodes and relationships. I use MERGE to get or create the nodes. For performance I also created a unique index for the node property.
CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique;
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
MERGE (subject:RESSOURCE {url: trim(line[0])})
MERGE (object:RESSOURCE {url: trim(line[1])})
CREATE (subject)-[:EQUIVALENCE]->(object);
The problem is that the import of about 1Mio. edges performs very bad. I profiled the import and also single MERGE queries and I couldn't see any usage of the unique index. In contrast a MATCH query makes use of the index. What can I do to use MERGE with the index?
Upvotes: 5
Views: 641
Reputation: 41676
Peter is correct, for some more explanation:
You run into the EAGER problem, see: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ you should see it in your EXPLAIN output (remove the periodic commit and use explain)
+--------------+----------------------------------+-----------------------+
| Operator | Identifiers | Other |
+--------------+----------------------------------+-----------------------+
| +EmptyResult | | |
| | +----------------------------------+-----------------------+
| +UpdateGraph | anon[179], line, object, subject | CreateRelationship |
| | +----------------------------------+-----------------------+
| +UpdateGraph | line, object, subject | MergeNode; :RESSOURCE |
| | +----------------------------------+-----------------------+
| +Eager | line, subject | |
| | +----------------------------------+-----------------------+
| +UpdateGraph | line, subject | MergeNode; :RESSOURCE |
| | +----------------------------------+-----------------------+
| +LoadCSV | line | |
+--------------+----------------------------------+-----------------------+
Eager will pull in your whole CSV file to ensure isolation and effectively disable your periodic commit.
If you do two passes, you could also try:
CREATE CONSTRAINT ON (e:RESSOURCE) assert e.url is unique;
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
FOREACH (url in line[0..1] |
MERGE (subject:RESSOURCE {url: trim(url)})
);
USING PERIODIC COMMIT 10000
LOAD CSV FROM 'file:///Users/x/data.csv' AS line FIELDTERMINATOR '\t'
MATCH (subject:RESSOURCE {url: trim(line[0])})
MATCH (object:RESSOURCE {url: trim(line[1])})
CREATE (subject)-[:EQUIVALENCE]->(object);
Upvotes: 6
Reputation: 406
Try this:
MATCH (subject:RESSOURCE {url: trim(line[0])}), (object:RESSOURCE {url: trim(line[1])})
MERGE (subject)-[:EQUIVALENCE]->(object)
Edit: I see you also want to merge the nodes - I would suggest doing a MERGE for each node:
MERGE (subject:RESSOURCE {url: trim(line[0])})
I also recommend doing the trim when you build the csv file so you limit the number of times neo4j is doing it and simplify this cypher.
Edit 2 (thanks to comment by Kai who corrected my above MERGE statement):
If you want to do a more complex MERGE with more properties you could do this:
MERGE (subject:RESSOURCE {url: trim(line[0])})
ON CREATE SET source=trim(line[1])
ON MERGE SET source=trim(line[1])
Upvotes: 3