Reputation: 4468
I have a graph with a node that has many outgoing relationships. The time it takes to add new outgoing relationships degrades as I add more relationships. The degradation appears to be due to the time taken to check that the relationship doesn't already exist (I'm using MERGE to add the relationships).
The destination nodes of the relationships have very few relationships themselves. Is there any way I can force Neo4j check for the existence of the relationship from the destination node instead of from the source node?
Here's test script to reproduce the problem. It creates one node with id 0 followed by 1000 nodes connected to node 0 by the HAS
relationship. As nodes are added the execution time increases linearly.
CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE
UNWIND RANGE(1,1000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to
Added 1001 labels, created 1001 nodes, set 1001 properties, created 1000 relationships, statement executed in 3496 ms.
UNWIND RANGE(1001,2000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to
Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 7030 ms.
UNWIND RANGE(2001,3000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to
Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 10489 ms.
UNWIND RANGE(3001,4000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to
Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 14390 ms.
If CREATE
is used instead of MERGE
the performance is much better. I can't use CREATE
though because I want to ensure the relationships are unique.
UNWIND RANGE(4001,5000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
CREATE (from)-[:HAS]->to
Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 413 ms.
Note: Tested with Neo4j v2.2.2
Upvotes: 8
Views: 913
Reputation: 41676
This is because cypher is not clever enough yet to use the degree of the nodes when executing merge. In the COST optimizer which is used for reads it is already cleverer but for updates the old RULE optimizer is used.
After playing around with it for a bit unsuccessfully
* changing the order of from & to
* using CREATE UNIQUE
instead of MERGE
* trying to use path-expressions which use get-degree in COST
I remembered that shortestPath
actually takes degree's into account and also goes from left to right
So I tried to combine that with CREATE
, and it worked really well,
here is an example for 100.000 nodes.
neo4j-sh (?)$ CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1
1054 ms
neo4j-sh (?)$
neo4j-sh (?)$ UNWIND RANGE(0,100000) AS i CREATE (to:Node { id: i});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 100001
Properties set: 100001
Labels added: 100001
2375 ms
neo4j-sh (?)$
neo4j-sh (?)$
neo4j-sh (?)$ MATCH (from:Node { id: 0 })
> UNWIND RANGE(1,100000) AS i
> MATCH (to:Node { id: i})
> WHERE shortestPath((to)<-[:HAS]-(from)) IS NULL
> CREATE (from)-[:HAS]->(to);
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 100000
2897 ms
neo4j-sh (?)$
neo4j-sh (?)$
neo4j-sh (?)$ MATCH (from:Node { id: 0 })
> UNWIND RANGE(1,100000) AS i
> MATCH (to:Node { id: i})
> WHERE shortestPath((to)<-[:HAS]-(from)) IS NULL
> CREATE (from)-[:HAS]->(to);
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
2360 ms
Upvotes: 10