Reputation: 439
I have a graph containing two kinds of nodes: user nodes and IP nodes.
The only edge I have is :LOGGED_IN
from user node to IP node.
What I am trying to find is all reachable user nodes from one user node.
So I have a Cypher like this:
MATCH (u: User)-[*]-(connected: User)
WHERE u.user_id = 'xxxxxxxxxxx'
RETURN distinct u, connected
However, I found that some IP nodes could be potentially a proxy IP so there will be more than 100 :LOGGED_IN
edges to that specific IP node.
I am looking for a way to find all reachable user nodes but bypass all the one in the path with a proxy IP.
Also the definition of proxy IP node should be configurable like I can set a threshold to 1000 :LOGGED_IN
edges. If there are more than 1000 incoming edges to the IP than it's a proxy IP.
Upvotes: 2
Views: 857
Reputation: 30397
An alternate to Bruno's solution (in case you have a great meany proxy ip
nodes), is to add a WHERE clause to exclude any of those proxy nodes during the expansion.
match p = (u:User)-[*]-(connected:User)
where u.user_id = 'xxxxxxxxxxx'
and none(node in nodes(p) where node:ip and size((node)<-[:LOGGED_IN]-()) >= 1000)
return distinct u, connected
The none()
function will be evaluated during expansion, not in a filter after the expansion, which should work for you.
One other thing you could try is using the expansion procs from APOC Procedures, some of which are optimized for only finding distinct nodes instead of finding all possible paths to the same nodes.
match (u:User)
where u.user_id = 'xxxxxxxxxxx'
call apoc.path.subgraphNodes(u, {labelFilter:'>User'}) yield node as connected
return u, connected
This one can't currently be optimized to exclude proxy ip nodes, but the NODE_GLOBAL uniqueness used during expansion may make up for it.
Upvotes: 1
Reputation: 66999
This is another version of @BrunoPeres' query that is fixed to correctly find paths without any proxy nodes.
Also, Unlike @InverseFalcon's first query, this one checks the degree-ness of each ip
node once, instead of checking the (label and) degree-ness of every node in every path. Which approach is better depends on your DB's data characteristics.
MATCH (i:ip)
WHERE SIZE(()-[:LOGGED_IN]->(i)) >= 1000
WITH COLLECT(i) AS proxies
MATCH path = (u:User)-[*]-(connected:User)
WHERE u.user_id = 'xxxxxxxxxxx' AND NONE(p IN proxies WHERE p IN nodes(path))
RETURN DISTINCT u, connected
Upvotes: 0
Reputation: 16365
Try it:
match (i:ip)
where size(()-[:LOGGED_IN]->(i)) > 1000
match p = (u:User)-[*]-(connected:User)
where u.user_id = 'xxxxxxxxxxx'
and not i in nodes(p)
return distinct u, connected
That is: get all IP nodes with more than 1000 :LOGGED_IN
relations. Then, get all paths that does not contains these nodes and return the desired data.
Upvotes: 0