Reputation: 1254
I have a graph of tags, which are related with each other. My goal is to create a Cypher query, which will return all tags that are related to an array of input tags via 1 or 2 hops.
I made a query, which doesn't work quite as intended.
MATCH (t:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t
MATCH (a:Tag)-[:RELATED*1..2]-(t)
RETURN DISTINCT a;
This query first finds the nodes A
, B
, C
and then searches for tags, that are related to A
, B
or C
via 1 node or less.
What I want to do though is to find tags, which are related to ALL three nodes (A
, B
and C
).
I know I could concatenate MATCH
and WITH
statements, and do something like this:
MATCH (t:Tag)-[:RELATED*1..2]-(a:Tag)
WHERE t.name="A"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="B"
WITH DISTINCT a
MATCH (t:Tag)-[:RELATED*1..2]-(a)
WHERE t.name="C"
...
RETURN DISTINCT a;
But it runs painfully slow, when the number of input tags increase (in this case only 3 input tags: A
, B
, C
).
So is there a way to make it in one query, similar to my first try?
Upvotes: 1
Views: 3131
Reputation: 66989
Here is a solution that only requires a single MATCH
clause.
MATCH (t:Tag)-[:RELATED*..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, COLLECT(DISTINCT other) AS others
WITH COLLECT(others) AS coll
RETURN FILTER(x IN coll[0] WHERE ALL(y IN coll[1..] WHERE x IN y)) AS res;
other
) that are "related" (by up to 2 steps) to each of your named tags (t
).other
nodes for each t
. In this example, we end up with 3 others
collections -- 1 for each t
.others
collections into a single coll
collection.others
collection, the query walks through the nodes in first others
collection, and extracts the ones that are also in the remaining others
collections. And, since each others
collection already contains distinct nodes, the result must also have distinct nodes.In addition, if you have a lot of tags, the above query can be made a bit faster by:
:Tag(name)
, and thenSpecifying the use of that index in your query -- by inserting the following clause between the MATCH
and WHERE
clauses. Currently, the Cypher engine does not automatically use the index for this specific query.
USING INDEX t:Tag(name)
Upvotes: 6
Reputation: 105
Here is an alternative:
MATCH shortestPath((t:Tag)<-[:RELATED*1..2]-(source:Tag)) //make sure there are no duplicate paths
WHERE source.name IN ["A","B","C"] AND NOT source.name = t.name //shortest path for identical node would throw an exception
WITH COLLECT(t) as tags //all tags that were reachable, with duplicates for reachable from multiple tags
UNWIND tags as tag //for each tag
WITH tag, tags //using with as match would be a drastic slowdown
WHERE size(filter(t IN tags WHERE ID(t) = ID(tag))) = 3 //if it is connected to all three, it must have been matched three times
RETURN DISTINCT m //since any match will still be in there 3 (or n) times
It first matches all reachable tags. All tags that were reachable from all tags in a list with the length n must have been matched n times if shortestPath is used. If you then filter by that criteria (present n times) the wanted tags can be retrieved with distinct.
Upvotes: 1
Reputation: 39915
How about this one:
MATCH (t:Tag)-[:RELATED*1..2]-(other:Tag)
WHERE t.name IN ["A", "B", "C"]
WITH t, collect(other.name) as others
WHERE ALL(x in ["A","B","C"] WHERE x in others)
RETURN t
The trick is put all the related nodes for t
into a collection (others) and use the ALL
predicate to make sure all of your A,B and C are part of that.
Upvotes: 1