Reputation: 811
I have a relatively large neo4j graph with 7 millions vertices and 5 millions of relations.
When I try to find out subtree size for one node neo4j is stuck in traversing 600,000 nodes, only 130 of whom are unique.
It does it because of cycles.
Looks like it applies distinct
only after it traverses the whole graph to maximum depth.
Is it possible to change this behaviour somehow?
The query is:
match (a1)-[o1*1..]->(a2) WHERE a1.id = '123' RETURN distinct a2
Upvotes: 1
Views: 137
Reputation: 66989
You can iteratively step through the subgraph a "layer" at a time while avoiding reprocessing the same node multiple times, by using the APOC procedure apoc.periodic.commit. That procedure iteratively processes a query until it returns 0.
Here is a example of this technique. It:
TempNode
node to keep track of a couple of important values between iterations, one of which will eventually contain the disinct ids of the nodes in the subgraph (except for the "root" node's id, since your question's query also leaves that out).Foo
, and that you have an index on Foo(id)
. This is for speeding up the MATCH
operations, and is not strictly necessary.WITH '123' AS rootId
MERGE (temp:TempNode)
SET temp.allIds = [rootId], temp.layerIds = [rootId];
CALL apoc.periodic.commit("
MATCH (temp:TempNode)
UNWIND temp.layerIds AS id
MATCH (n:Foo) WHERE n.id = id
OPTIONAL MATCH (n)-->(next)
WHERE NOT next.id IN temp.allIds
WITH temp, COLLECT(DISTINCT next.id) AS layerIds
SET temp.allIds = temp.allIds + layerIds, temp.layerIds = layerIds
RETURN SIZE(layerIds);
");
MATCH (temp:TempNode)
// ... use temp.allIds, which contains the distinct ids in the subgraph ...
Upvotes: 2