mlo0424
mlo0424

Reputation: 439

Neo4j and Cypher: find all reachable nodes from a set of nodes with some constraints

I have a graph containing two kinds of nodes: user nodes and IP nodes.

The only edge I have is :LOGGED_IN from user node to IP node.

What I am trying to find is all reachable user nodes from one user node.

So I have a Cypher like this:

MATCH (u: User)-[*]-(connected: User) 
WHERE u.user_id = 'xxxxxxxxxxx'
RETURN distinct u, connected

However, I found that some IP nodes could be potentially a proxy IP so there will be more than 100 :LOGGED_IN edges to that specific IP node.

I am looking for a way to find all reachable user nodes but bypass all the one in the path with a proxy IP.

Also the definition of proxy IP node should be configurable like I can set a threshold to 1000 :LOGGED_IN edges. If there are more than 1000 incoming edges to the IP than it's a proxy IP.

Upvotes: 2

Views: 857

Answers (3)

InverseFalcon
InverseFalcon

Reputation: 30397

An alternate to Bruno's solution (in case you have a great meany proxy ip nodes), is to add a WHERE clause to exclude any of those proxy nodes during the expansion.

match p = (u:User)-[*]-(connected:User) 
where u.user_id = 'xxxxxxxxxxx'
and none(node in nodes(p) where node:ip and size((node)<-[:LOGGED_IN]-()) >= 1000)
return distinct u, connected

The none() function will be evaluated during expansion, not in a filter after the expansion, which should work for you.

One other thing you could try is using the expansion procs from APOC Procedures, some of which are optimized for only finding distinct nodes instead of finding all possible paths to the same nodes.

match (u:User)
where u.user_id = 'xxxxxxxxxxx'
call apoc.path.subgraphNodes(u, {labelFilter:'>User'}) yield node as connected
return u, connected

This one can't currently be optimized to exclude proxy ip nodes, but the NODE_GLOBAL uniqueness used during expansion may make up for it.

Upvotes: 1

cybersam
cybersam

Reputation: 66999

This is another version of @BrunoPeres' query that is fixed to correctly find paths without any proxy nodes.

Also, Unlike @InverseFalcon's first query, this one checks the degree-ness of each ip node once, instead of checking the (label and) degree-ness of every node in every path. Which approach is better depends on your DB's data characteristics.

MATCH (i:ip)
WHERE SIZE(()-[:LOGGED_IN]->(i)) >= 1000
WITH COLLECT(i) AS proxies
MATCH path = (u:User)-[*]-(connected:User) 
WHERE u.user_id = 'xxxxxxxxxxx' AND NONE(p IN proxies WHERE p IN nodes(path))
RETURN DISTINCT u, connected

Upvotes: 0

Bruno Peres
Bruno Peres

Reputation: 16365

Try it:

match (i:ip)
where size(()-[:LOGGED_IN]->(i)) > 1000
match p = (u:User)-[*]-(connected:User) 
where u.user_id = 'xxxxxxxxxxx'
and not i in nodes(p)
return distinct u, connected

That is: get all IP nodes with more than 1000 :LOGGED_IN relations. Then, get all paths that does not contains these nodes and return the desired data.

Upvotes: 0

Related Questions