Reputation: 57
I have 2 csv files: Entity
, with 2.8M records and Rships
, with 4.2M records. Entity
has a list of ENT_ID
and PARENTID
. If ENT_ID
has PARENTID
'0', it means no parent. If it does, then it would be one of the ENT_ID
from above. I need to create a relationship of ENT_ID
and PARENTID
. I tried loading this through the import tool available for Neo4j2.3.4 community edition, however I kept receiving very many errors. In the end, I managed to load it using the load csv cypher query.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/...(read file address here)/Entities.txt" AS Entity FIELDTERMINATOR '|'
CREATE (n:Entity{ENT_ID: Entity.ENT_ID,NAME: Entity.NAME,ENTRYTYPE: Entity.ENTRYTYPE, PARENTID: Entity.PARENTID,ENTRYCATEGORY: Entity.ENTRYCATEGORY,ENTRYSUBCATEGORY: Entity.ENTRYSUBCATEGORY,COUNTRY: Entity.COUNTRY,PWC_ADL_ID: Entity.PWC_ADL_ID })
I created relationship between the PARENTID and ENTID using the following:
PROFILE
MATCH(Entity)
MATCH (a:Entity {ENT_ID : Entity.ENT_ID})
WITH Entity, a
MATCH (b:Entity {ENT_ID : Entity.PARENTID})
WITH a,b
MERGE (a)-[r:RELATION]->(b)
Now I loaded the CSV file for relationship as :
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/.../EntitiesRelationships.txt" AS Rships FIELDTERMINATOR '|'
CREATE (n:Rships{RID: Rships.RID,Ent_IDParent: Rships.Ent_IDParent,Ent_IDChild: Rships.Ent_IDChild, RelationID: Rships.RelationID })
I created index on both the CSVs:
CREATE INDEX ON :Entity(ENT_ID)
CREATE INDEX ON :Rships(Ent_IDParent)
CREATE INDEX ON :Rships(Ent_IDChild)
Till here the above code runs fine although it takes nearly a day to do so:
Now, when I try to create relationship using the query:
PROFILE
Match(Rships)
MATCH(a:Rships {ENT_IDParent: Rships.ENT_IDParent})
WITH Rships, a
MATCH(b:Rships {ENT_IDParent: Rships.ENT_IDChild})
WITH a,b
MERGE (a)-[r:RELATION]->(b)
This query keeps on running for about an hour without any result.
Help would be appreciated.
Thanks
Upvotes: 2
Views: 42
Reputation: 41706
Labels, Properties and relationship-types are case-sensitive. Your index has different spellings than the properties in your statement.
You should also look at your statements with EXPLAIN or PROFILE then you see it immediately.
For the 2nd statement I would also use LOAD CSV again, to drive the lookup of the entities, so that you get periodic commit.
In total your import shouldn't take longer than a few minutes.
// create unique constraint
CREATE CONSTRAINT ON (n:Entity) ASSERT n.ENT_ID IS UNIQUE;
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/...(read file address here)/Entities.txt" AS row FIELDTERMINATOR '|'
CREATE (n:Entity {ENT_ID: row.ENT_ID,NAME: row.NAME,ENTRYTYPE: row.ENTRYTYPE, PARENTID: row.PARENTID, ENTRYCATEGORY: row.ENTRYCATEGORY, ENTRYSUBCATEGORY: row.ENTRYSUBCATEGORY, COUNTRY: row.COUNTRY, PWC_ADL_ID: row.PWC_ADL_ID });
// you can also use this if you want to set all properties:
// CREATE (n:Entity) SET n = row
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/...(read file address here)/Entities.txt" AS row FIELDTERMINATOR '|'
MATCH (a:Entity {ENT_ID : row.ENT_ID})
MATCH (b:Entity {ENT_ID : row.PARENTID})
MERGE (a)-[:PARENT]->(b);
Why do you create the relationships as nodes and not as relationships, that doesn't make sense to me ???
Instead of:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/.../EntitiesRelationships.txt" AS row FIELDTERMINATOR '|'
CREATE (n:Rships {RID: row.RID,Ent_IDParent: row.Ent_IDParent, Ent_IDChild: row.Ent_IDChild, RelationID: row.RelationID });
I would do:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/.../EntitiesRelationships.txt" AS row FIELDTERMINATOR '|'
MATCH (a:Entity {ENT_ID : row.Ent_IDChild})
MATCH (b:Entity {ENT_ID : row.Ent_IDParent})
CREATE (a)-[:PARENT {RID: row.RID, RelationID: row.RelationID}]->(b);
I created index on both the CSVs:
CREATE INDEX ON :Entity(ENT_ID)
CREATE INDEX ON :Rships(Ent_IDParent)
CREATE INDEX ON :Rships(Ent_IDChild)
Till here the above code runs fine although it takes nearly a day to do so:
Now, when I try to create relationship using the query:
// you also had a typo in this query, matching b also for the parent-id
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///C:/.../EntitiesRelationships.txt" AS row FIELDTERMINATOR '|'
MATCH(a:Rships {Ent_IDParent: row.ENT_IDParent})
MATCH(b:Rships {Ent_IDChild: row.ENT_IDChild})
MERGE (a)-[:PARENT]->(b)
These two statements make no sense at all, you create random cross products, profile should show you a huge blow up in rows and database-hits like billions
MATCH(Entity) MATCH (a:Entity {ENT_ID : Entity.ENT_ID}) WITH Entity, a MATCH (b:Entity {ENT_ID : Entity.PARENTID}) WITH a,b MERGE (a)-[r:RELATION]->(b)
Match(Rships) MATCH(a:Rships {ENT_IDParent: Rships.ENT_IDParent}) WITH Rships, a MATCH(b:Rships {ENT_IDParent: Rships.ENT_IDChild}) WITH a,b MERGE (a)-[r:RELATION]->(b)
Upvotes: 1