Neo4j performance with cycles

Question

I have a relatively large neo4j graph with 7 millions vertices and 5 millions of relations.

When I try to find out subtree size for one node neo4j is stuck in traversing 600,000 nodes, only 130 of whom are unique. It does it because of cycles. Looks like it applies distinct only after it traverses the whole graph to maximum depth.

Is it possible to change this behaviour somehow?

The query is:

match (a1)-[o1*1..]->(a2) WHERE a1.id = '123' RETURN distinct a2

cybersam · Accepted Answer

You can iteratively step through the subgraph a "layer" at a time while avoiding reprocessing the same node multiple times, by using the APOC procedure apoc.periodic.commit. That procedure iteratively processes a query until it returns 0.

Here is a example of this technique. It:

Uses a temporary TempNode node to keep track of a couple of important values between iterations, one of which will eventually contain the disinct ids of the nodes in the subgraph (except for the "root" node's id, since your question's query also leaves that out).
Assumes that all the nodes you care about share the same label, Foo, and that you have an index on Foo(id). This is for speeding up the MATCH operations, and is not strictly necessary.

Step 1: Create TempNode (using MERGE, to reuse existing node, if any)

WITH '123' AS rootId
MERGE (temp:TempNode)
SET temp.allIds = [rootId], temp.layerIds = [rootId];

Step 2: Perform iterations (to get all subgraph nodes)

CALL apoc.periodic.commit("
  MATCH (temp:TempNode)
  UNWIND temp.layerIds AS id
  MATCH (n:Foo) WHERE n.id = id
  OPTIONAL MATCH (n)-->(next)
  WHERE NOT next.id IN temp.allIds
  WITH temp, COLLECT(DISTINCT next.id) AS layerIds
  SET temp.allIds = temp.allIds + layerIds, temp.layerIds = layerIds
  RETURN SIZE(layerIds);
");

Step 3: Use subgraph ids

MATCH (temp:TempNode)
// ... use temp.allIds, which contains the distinct ids in the subgraph ...

Neo4j performance with cycles

Answers (1)

Step 1: Create TempNode (using MERGE, to reuse existing node, if any)

Step 2: Perform iterations (to get all subgraph nodes)

Step 3: Use subgraph ids

Related Questions