Reputation: 135
I have a user.csv file with students:
id, first_name, last_name, locale, gender
1, Hasso, Plattner, en, male
2, Tina, Turner, de, female
and a memberships.csv file with course memberships of the students:
id, user_id, course_id
1, 1, 3
2, 1, 4
3, 2, 4
4, 2, 5
To transform students and courses into vertices and course memberberships into edges, I joined the user information into memberships.csv
id, user_id, first_name, last_name, course_id, locale, gender
1, 1, Hasso, Plattner, 3, en, male
2, 1, Hasso, PLattner, 4, en, male
3, 2, Tina, Turner, 4, de, female
4, 2, Tina, Turner, 5, de, female
and used load csv, some constraints and MERGE:
create constraint on (g:Gender) assert g.gender is unique
create constraint on (l:locale) assert l.locale is unique
create constraint on (c:Course) assert c.course is unique
create constraint on (s:Student) assert s.student is unique
USING PERIODIC COMMIT 20000
LOAD CSV WITH HEADERS FROM
'file: memberships.csv'
AS line
MERGE (s:Student {id: line.id, name: line.first_name +" "+line.last_name })
MERGE (c:Course {id: line.course_id})
MERGE (g:Gender {gender:line.gender})
MERGE (l:locale {locale:line.locale})
MERGE (s)-[:HAS_GENDER]->(g)
MERGE (s)-[:HAS_LANGUAGE]->(l)
MERGE (s)-[:ENROLLED_IN]->(c)
For 1 000 memberships neo4j needs 2 seconds to load, for 10 000 memberships 3 minutes, for 100 000 it fails with 'Unknown error'.
i) How to get rid of the error? ii) Is there a more elegant way to load such a structure from .csv with about 600 000 memberships?
I am using a local machine with 2,4 GHz and 16GB RAM.
Upvotes: 2
Views: 446
Reputation: 41676
Try to import first the nodes from their CSV and then the rels afterwards.
Also try to do an import run without Gender and Locale nodes and instead store it as a property.
If you really need those (dense) nodes later on, try to run it like this:
CREATE (g:Gender {gender:"male"})
MATCH (s:Student {gender:"male"})
CREATE (s)-[:HAS_GENDER]->(g)
Those relationships will be unique, and create is cheaper than MERGE. I assume that checking 2*(n-1) rels per inserted student adds up as it is then O(n^2)
Upvotes: 0
Reputation: 3308
The Neo4j browser has a 60 second timeout period on Cypher queries (due to HTTP transport). This does not mean that your query is not running to completion, in fact there has been no error at the database-level. Your query will continue to run via the browser but you will not be able to see its result. To see long running queries run to completion please use the Neo4j shell.
http://docs.neo4j.org/chunked/stable/shell.html
Upvotes: 0