Reputation: 1309
I'm trying to create (or do nothing if the node already exists) 1000+ nodes that are very large in the sense of the amount of data I'm storing as properties on each. The properties for each node are like:
props: {
'HPSI0713i-aehn_2_QC1Hip-839.HPSI0813i-ffdb_3_QC1Hip-1.1': ['19','26','1.00','QC1Hip-839'],
[... 1431 more like this ...]
}
and take up about 650K if I store the data in a text file.
If I create a node (using MERGE and a constrained unique property to not do anything if it already exists) with properties like these it is taking 20-40 seconds per node.
To debug I split up the node creation from setting the properties, creating the node first, getting the node id back, then matching that id to set properties. Node creation is as fast as expected. Here's my debugging of setting properties:
will run cypher: MATCH (n) WHERE id(n) = 198058 SET n = { props } return n
- setting 1432 new properties to node with unique property 88aa3f215e73daea9bf65147e630cbd7_QC1Hip-1 took 19 seconds
will run cypher: MATCH (n) WHERE id(n) = 198059 SET n = { props } return n
- setting 1432 new properties to node with unique property 88aa3f215e73daea9bf65147e630cbd7_QC1Hip-10 took 22 seconds
One odd thing I've noticed during debugging is if I delete these nodes like so:
MATCH (n:`labelforthesenodes`) OPTIONAL MATCH (n)-[r]-() DELETE n, r
If I try adding them again it's fast for the nodes I've previously done, then slow again for nodes I haven't:
will run cypher: MATCH (n) WHERE id(n) = 198063 SET n = { props } return n
- setting 1432 new properties to node with unique property 88aa3f215e73daea9bf65147e630cbd7_QC1Hip-1 took 0 seconds
will run cypher: MATCH (n) WHERE id(n) = 198064 SET n = { props } return n
- setting 1432 new properties to node with unique property 88aa3f215e73daea9bf65147e630cbd7_QC1Hip-10 took 1 seconds
will run cypher: MATCH (n) WHERE id(n) = 198068 SET n = { props } return n
- setting 1432 new properties to node with unique property 88aa3f215e73daea9bf65147e630cbd7_QC1Hip-1016 took 24 seconds
What can be done to speed this operation up? At the moment I have a simple loop that creates/sets properties on a single node each time, so it would take at least 6hrs to create my 1000 nodes.
Upvotes: 0
Views: 499
Reputation: 1309
I solved the performance issue by, instead of setting 1000+ properties each with a value of an array of values, setting a single property with a value of my entire data structure converted to a JSON string.
Now it takes ~0 seconds to create/update my nodes, instead of 20+.
Evidently there's some performance bottle neck in Neo4J dealing with adding lots of (unindexed) properties to a node.
Upvotes: 0
Reputation: 1778
As you are dealing with fairly large datasets, I suggest you should have a look into Batch insertion instead of creating single node at a time.
In case of updating, you could maintain a dictionary of nodes instead of updating the database every now and then as creating/updating nodes into the database takes significantly more time than modifying the dictionary. When all the node related modifications are over, you can finally send the entire dictionary for batch insertion
Upvotes: 0