Reputation: 191
I have a big neo4j db with info about celebs, all of them have relations with many others, they are linked, dated, married to each other. So I need to get random path from one celeb with defined count of relations (5). I don't care who will be in this chain, the only condition I have I shouldn't have repeated celebs in chain.
To be more clear: I need to get "new" chain after each query, for example:
Is that possible to do it by query?
Upvotes: 0
Views: 245
Reputation: 5047
This is doable in Cypher, but it's quite tricky. You mention that
the only condition I have I shouldn't have repeated celebs in chain.
This condition could be captured by using node-isomorphic pattern matching, which requires all nodes in a path to be unique. Unfortunately, this is not yet supported in Cypher. It is proposed as part of the openCypher project, but is still work-in-progress. Currently, Cypher only supports relationship uniqueness, which is not enough for this use case as there are multiple relationship types (e.g. A is married to B, but B also collaborated with A, so we already have a duplicate with only two nodes).
APOC solution. If you can use the APOC library, take a look at the path expander, which supports various uniqueness constraints, including NODE_GLOBAL
.
Plain Cypher solution. To work around this limitation, you can capture the node uniqueness constraint with a filtering operation:
MATCH p = (c1:Celebrity {name: 'Rita Ora'})-[*5]-(c2:Celebrity)
UNWIND nodes(p) AS node
WITH p, count(DISTINCT node) AS countNodes
WHERE countNodes = 5
RETURN p
LIMIT 1
Performance-wise this should be okay as long as you limit its results because the query engine will basically keep enumerating new paths until one of them passes the filtering test.
The goal of the UNWIND nodes(p) AS node WITH count(DISTINCT node) ...
construct is to remove duplicates from the list of nodes by first UNWIND-ing it to separate rows, then aggregating them to a unique collection using DISTINCT
. We then check whether the list of unique nodes still has 5 elements - if so, the original list was also unique and we RETURN
the results.
Note. Instead of UNWIND
and count(DISTINCT ...)
, getting unique elements from a list could be expressed in other ways:
(1) Using a list comprehension and ranges:
WITH [1, 2, 2, 3, 2] AS l
RETURN [i IN range(0, length(l)-1) WHERE NOT l[i] IN l[0..i] | l[i]]
(2) Using reduce:
WITH [1, 2, 2, 3, 2] AS l
RETURN reduce(acc = [], i IN l | acc + CASE NOT i IN acc WHEN true THEN [i] ELSE [] END)
However, I believe both forms are less readable than the original one.
Upvotes: 1