Reputation: 21
I am trying to load 500000 nodes ,but the query is not executed successfully.Can any one tell me the limitation of number of nodes in neo4j community edition database?
I am running these queries
result = session.run("""
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:///relationships.csv" AS row
merge (s:Start {ac:row.START})
on create set s.START=row.START
merge (e:End {en:row.END})
on create set s.END=row.END
FOREACH (_ in CASE row.TYPE WHEN "PAID" then [1] else [] end |
MERGE (s)-[:PAID {cr:row.CREDIT}]->(e))
FOREACH (_ in CASE row.TYPE WHEN "UNPAID" then [1] else [] end |
MERGE (s)-[:UNPAID {db:row.DEBIT}]->(e))
RETURN s.START as index, count(e) as connections
order by connections desc
""")
Upvotes: 0
Views: 3472
Reputation: 11715
I don't think the community edition is more limited than the enterprise edition in that regard, and most of the limits have been removed in 3.0.
Anyway, I can easily create a million nodes (in one transaction):
neo4j-sh (?)$ unwind range(1, 1000000) as i create (n:Node) return count(n);
+----------+
| count(n) |
+----------+
| 1000000 |
+----------+
1 row
Nodes created: 1000000
Labels added: 1000000
3495 ms
Running that 10 times, I've definitely created 10 million nodes:
neo4j-sh (?)$ match (n) return count(n);
+----------+
| count(n) |
+----------+
| 10000000 |
+----------+
1 row
3 ms
Your problem is most likely related to the size of the transaction: if it's too large, it can result in an OutOfMemory error, and before that it can slow the instance to a crawl because of all the garbage collection. Split the node creation in smaller batches, e.g. with USING PERIODIC COMMIT
if you use LOAD CSV
.
Update:
Your query already includes USING PERIODIC COMMIT
and only creates 2 nodes and 1 relationship per line from the CSV file, so it most likely has to do with the performance of the query itself than the size of the transaction.
You have Start
nodes with 2 properties set with the same value from the CSV (ac
and START
), and End
nodes also with 2 properties set with the same value (en
and END
). Is there a unicity constraint on the property used for the MERGE
? Without it, as nodes are created, processing each line will take longer and longer as it needs to scan all the existing nodes with the wanted label (an O(n^2) algorithm, which is pretty bad for 500K nodes).
CREATE CONSTRAINT ON (n:Start) ASSERT n.ac IS UNIQUE;
CREATE CONSTRAINT ON (n:End) ASSERT n.en IS UNIQUE;
That's probably the main improvement to apply.
However, do you actually need to MERGE
the relationships (instead of CREATE
)? Either the CSV contains a snapshot of the current credit relationships between all Start
and End
nodes (in which case there's a single relationship per pair), or it contains all transactions and there's no real reason to merge those for the same amount.
Finally, do you actually need to report the sorted, aggregated result from that loading query? It requires more memory and could be split into a separate query, after the loading has succeeded.
Upvotes: 2