mlo0424
mlo0424

Reputation: 439

Neo4j all reachable nodes thru a specific relationship from a specific node

I would like to find out all reachable nodes via one specific relationship starting from a node.

I have the below graphs.

(User) --[LOGGED_IN]--> (Ip)
(User) --[FRIEND]--> (User)

I would like to find all reachable User nodes thru LOGGED_IN relationship. eg.

    user1 logged_in ip1
    user2 logged_in ip1
    user2 logged_in ip2
    user3 logged_in ip2
    user3 logged_in ip3
    user4 logged_in ip3
    user5 logged_in ip4
    user1 friend user5

If I start from user1 I want to find user1, user2, user3, user4. I would like to ignore the FRIEND relationship.

I know if I only have [:LOGGED_IN] relationship I can do the below cypher. But I also have FRIEND relationship and this will also give me the users linked by [:FRIEND]

MATCH (u:User)-[*]->(connected:User)
WHERE u.user_id = <user1_id>
RETURN connected

Upvotes: 1

Views: 675

Answers (2)

InverseFalcon
InverseFalcon

Reputation: 30397

If your nodes are deeply interconnected, then cypher alone may not work out for you, since MATCH operations in cypher with variable-length paths are all about finding all possible paths that fit the pattern, and that quickly gets you into trouble with the number of possible paths goes through the roof. This isn't a good fit when you're only concerned about distinct connected nodes.

If you have access to APOC Procedures, there are some path expander procedures that are optimized toward finding connected nodes. After installing and configuring APOC, give this a try:

MATCH (u:User)
WHERE u.user_id = <user1_id>
CALL apoc.path.subgraphNodes(u, {relationshipFilter:'LOGGED_IN', labelFilter:'>User', filterStartNode:true}) YIELD node as connected
RETURN connected;

Upvotes: 1

cybersam
cybersam

Reputation: 66999

This should work (with the appropriate value for <user1_id>):

MATCH (u:User)-[:logged_in*0..]-(connected:User)
WHERE u.user_id = <user1_id>
RETURN DISTINCT connected;

The (u:User)-[:logged_in*0..]-(connected:User) pattern:

  • requires all matched relationships to be of type logged_in.
  • specifies a lower bound of 0 for the variable-length path pattern, which allows the u node itself to be assigned to connected.
  • does not specify a direction for the loggeded_in relationship, to permit traversals from Ip nodes to User nodes (and vice versa).

The DISTINCT keyword is used to eliminate duplicate results.

This query will always return the u node (if it exists), since a node is trivially reachable from itself.

[UPDATED]

If you have enough data, then the variable-length path pattern will have to specify a reasonable upper bound (e.g., [:logged_in*0..5]) to avoid running out of memory or having the query take forever to complete.

Upvotes: 3

Related Questions