Deepak Kumar
Deepak Kumar

Reputation: 21

what is the limitation in neo4j community edition in terms of data storage(i.e. number of records)?

I am trying to load 500000 nodes ,but the query is not executed successfully.Can any one tell me the limitation of number of nodes in neo4j community edition database?

I am running these queries

result = session.run("""
    USING PERIODIC COMMIT 10000
    LOAD CSV WITH HEADERS FROM "file:///relationships.csv" AS row
    merge (s:Start {ac:row.START})
    on create set s.START=row.START
    merge (e:End {en:row.END})
    on create set s.END=row.END
    FOREACH (_ in CASE row.TYPE WHEN "PAID" then [1] else [] end |
        MERGE (s)-[:PAID {cr:row.CREDIT}]->(e))
    FOREACH (_ in CASE row.TYPE WHEN "UNPAID" then [1] else [] end |
        MERGE (s)-[:UNPAID {db:row.DEBIT}]->(e))
    RETURN s.START as index, count(e) as connections
    order by connections desc
""")

Upvotes: 0

Views: 3472

Answers (1)

Frank Pavageau
Frank Pavageau

Reputation: 11715

I don't think the community edition is more limited than the enterprise edition in that regard, and most of the limits have been removed in 3.0.

Anyway, I can easily create a million nodes (in one transaction):

neo4j-sh (?)$ unwind range(1, 1000000) as i create (n:Node) return count(n);
+----------+
| count(n) |
+----------+
| 1000000  |
+----------+
1 row
Nodes created: 1000000
Labels added: 1000000
3495 ms

Running that 10 times, I've definitely created 10 million nodes:

neo4j-sh (?)$ match (n) return count(n);
+----------+
| count(n) |
+----------+
| 10000000 |
+----------+
1 row
3 ms

Your problem is most likely related to the size of the transaction: if it's too large, it can result in an OutOfMemory error, and before that it can slow the instance to a crawl because of all the garbage collection. Split the node creation in smaller batches, e.g. with USING PERIODIC COMMIT if you use LOAD CSV.


Update:

Your query already includes USING PERIODIC COMMIT and only creates 2 nodes and 1 relationship per line from the CSV file, so it most likely has to do with the performance of the query itself than the size of the transaction.

You have Start nodes with 2 properties set with the same value from the CSV (ac and START), and End nodes also with 2 properties set with the same value (en and END). Is there a unicity constraint on the property used for the MERGE? Without it, as nodes are created, processing each line will take longer and longer as it needs to scan all the existing nodes with the wanted label (an O(n^2) algorithm, which is pretty bad for 500K nodes).

CREATE CONSTRAINT ON (n:Start) ASSERT n.ac IS UNIQUE;
CREATE CONSTRAINT ON (n:End) ASSERT n.en IS UNIQUE;

That's probably the main improvement to apply.

However, do you actually need to MERGE the relationships (instead of CREATE)? Either the CSV contains a snapshot of the current credit relationships between all Start and End nodes (in which case there's a single relationship per pair), or it contains all transactions and there's no real reason to merge those for the same amount.

Finally, do you actually need to report the sorted, aggregated result from that loading query? It requires more memory and could be split into a separate query, after the loading has succeeded.

Upvotes: 2

Related Questions