Dave
Dave

Reputation: 4468

How can I optimise a Neo4j MERGE query on a node with many relationships?

I have a graph with a node that has many outgoing relationships. The time it takes to add new outgoing relationships degrades as I add more relationships. The degradation appears to be due to the time taken to check that the relationship doesn't already exist (I'm using MERGE to add the relationships).

The destination nodes of the relationships have very few relationships themselves. Is there any way I can force Neo4j check for the existence of the relationship from the destination node instead of from the source node?

Here's test script to reproduce the problem. It creates one node with id 0 followed by 1000 nodes connected to node 0 by the HAS relationship. As nodes are added the execution time increases linearly.

CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE

UNWIND RANGE(1,1000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to

Added 1001 labels, created 1001 nodes, set 1001 properties, created 1000 relationships, statement executed in 3496 ms.

UNWIND RANGE(1001,2000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to

Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 7030 ms.

UNWIND RANGE(2001,3000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to

Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 10489 ms.

UNWIND RANGE(3001,4000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
MERGE (from)-[:HAS]->to

Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 14390 ms.

If CREATE is used instead of MERGE the performance is much better. I can't use CREATE though because I want to ensure the relationships are unique.

UNWIND RANGE(4001,5000) AS i
MERGE (from:Node { id: 0 })
MERGE (to:Node { id: i})
CREATE (from)-[:HAS]->to

Added 1000 labels, created 1000 nodes, set 1000 properties, created 1000 relationships, statement executed in 413 ms.

Note: Tested with Neo4j v2.2.2

Upvotes: 8

Views: 913

Answers (1)

Michael Hunger
Michael Hunger

Reputation: 41676

This is because cypher is not clever enough yet to use the degree of the nodes when executing merge. In the COST optimizer which is used for reads it is already cleverer but for updates the old RULE optimizer is used.

After playing around with it for a bit unsuccessfully * changing the order of from & to * using CREATE UNIQUE instead of MERGE * trying to use path-expressions which use get-degree in COST

I remembered that shortestPath actually takes degree's into account and also goes from left to right

So I tried to combine that with CREATE, and it worked really well, here is an example for 100.000 nodes.

neo4j-sh (?)$ CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1
1054 ms
neo4j-sh (?)$ 
neo4j-sh (?)$ UNWIND RANGE(0,100000) AS i CREATE (to:Node { id: i});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 100001
Properties set: 100001
Labels added: 100001
2375 ms
neo4j-sh (?)$ 
neo4j-sh (?)$ 
neo4j-sh (?)$ MATCH (from:Node { id: 0 })
> UNWIND RANGE(1,100000) AS i
> MATCH (to:Node { id: i})
> WHERE shortestPath((to)<-[:HAS]-(from)) IS NULL
> CREATE (from)-[:HAS]->(to);
+-------------------+
| No data returned. |
+-------------------+
Relationships created: 100000
2897 ms
neo4j-sh (?)$ 
neo4j-sh (?)$ 
neo4j-sh (?)$ MATCH (from:Node { id: 0 })
> UNWIND RANGE(1,100000) AS i
> MATCH (to:Node { id: i})
> WHERE shortestPath((to)<-[:HAS]-(from)) IS NULL
> CREATE (from)-[:HAS]->(to);
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
2360 ms

Upvotes: 10

Related Questions