lmazgon
lmazgon

Reputation: 1254

Neo4j match nodes related to all nodes in collection

I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.

I made a query, which doesn't work quite as intended.

MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;

This query first finds the nodes A, B, C and then searches for tags, that are related to A, B or C via 1 node or less.

What I want to do though is to find tags, which are related to ALL three nodes (A, B and C).


I know I could concatenate MATCH and WITH statements, and do something like this:

MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"

WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"

WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;

But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A, B, C).


So is there a way to make it in one query, similar to my first try?

Upvotes: 1

Views: 3131

Answers (3)

cybersam
cybersam

Reputation: 66989

Here is a solution that only requires a single MATCH clause.

MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
  • The query finds all the tags (other) that are "related" (by up to 2 steps) to each of your named tags (t).
  • It then uses aggregation to collect the distinct other nodes for each t. In this example, we end up with 3 others collections -- 1 for each t.
  • It then collects all the others collections into a single coll collection.
  • Finally, since the result set is supposed to be the intersection of every others collection, the query walks through the nodes in first others collection, and extracts the ones that are also in the remaining others collections. And, since each others collection already contains distinct nodes, the result must also have distinct nodes.

In addition, if you have a lot of tags, the above query can be made a bit faster by:

  1. Creating an index (or uniqueness constraint, which automatically creates an index for you) on :Tag(name), and then
  2. Specifying the use of that index in your query -- by inserting the following clause between the MATCH and WHERE clauses. Currently, the Cypher engine does not automatically use the index for this specific query.

    USING INDEX t:Tag(name)
    

Upvotes: 6

Zain
Zain

Reputation: 105

Here is an alternative:

MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths 
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times

It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.

Upvotes: 1

Stefan Armbruster
Stefan Armbruster

Reputation: 39915

How about this one:

MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"] 
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t

The trick is put all the related nodes for t into a collection (others) and use the ALL predicate to make sure all of your A,B and C are part of that.

Upvotes: 1

Related Questions