jpadilladev
jpadilladev

Reputation: 1936

Neo4j graph backtracking algorithm

I have a complex question about Neo4j and what Traversal can do.

Imagine you have the following Neo4j graph

Graph

My idea is to traverse the whole graph, and if I find a 'false' node, expand this status to his neighbours and so on, and finally in this example we will have all nodes with a 'false' status. (In real life, I have more conditions to set this status to true or false while traversing, but I simplified it a bit for the question)

I think I need some backtracking algorithm to do this, but in Neo4j I don't know how to do this, or if is it even possible. In addition, this graph could be a very huge graph.

How would yo do this with Java and Neo4j?

Thanks.

Upvotes: 3

Views: 641

Answers (2)

InverseFalcon
InverseFalcon

Reputation: 30397

For efficient matching to reachable nodes, there are two options that tend to work well.

With Neo4j 3.2.x, there is an efficient means to match to all distinct reachable nodes through a variable relationship match plus usage of DISTINCT, but it requires an upper bound on the variable-length relationship. Using a suitably high number should work. Something like:

MATCH (:SomeLabel{someProperty:false})-[*..999999]->(x)
WITH distinct x
SET x.someProperty = false

Otherwise, APOC Procedures offers apoc.path.subgraphNodes() which also does efficient matching to reachable nodes in a subgraph:

MATCH (start:SomeLabel{someProperty:false})
CALL apoc.path.subgraphNodes(start, {}) YIELD node
SET node.someProperty = false;

EDIT

To add more detail for the first option (why not just use *, and why use DISTINCT), keep in mind that by default Cypher will match to all possible paths when we use *, even if those paths end at the same node as a previously matched path. This can become inefficient in a sufficiently connected graph (when we don't have a reasonable upper bound, and we're not using LIMIT), with the possibility of blowing your heap or hanging indefinitely.

This is especially to be avoided when we aren't interested in all possible paths, just all possible nodes that are reachable.

In Neo4j 3.2, an optimization was introduced called pruning-var expand, which changes the traversal logic in the case when all of the following are true:

  1. We have a var-length expansion
  2. We aren't referencing the path in any way (such as by setting a path variable to the match pattern, or setting a variable on the var-length relationship)
  3. We have an upper-bound on the var-length expansion
  4. We ask for DISTINCT nodes or values obtainable from the expansion

Basically when the query is such that it is clear that we want distinct nodes (or values from distinct nodes) from a var-length expansion and don't care about the paths.

The planner will then use the pruning var expand (you can confirm by checking the query plan from EXPLAIN or PROFILE) to efficiently match to reachable nodes.

Upvotes: 2

Bruno Peres
Bruno Peres

Reputation: 16355

I don't know if I understood your question completely, but I believe that a simple Cypher query is enough. Something like:

MATCH ({someProperty : false})-[*]-(neighbours)
SET neighbours.someProperty = false

Upvotes: 0

Related Questions