Reputation: 1103
Trying to find similar movies on the basis of tags. But I also need all the tags for the given movie and its each similar movie (to do some calculations). But surprisingly collect(h.w)
gives repeated values of h.w (where w
is a property of h
)
Here is the cypher query. Please help.
MATCH (m:Movie{id:1})-[h1:Has]->(t:Tag)<-[h2:Has]-(sm:Movie),
(m)-[h:Has]->(t0:Tag),
(sm)-[H:Has]->(t1:Tag)
WHERE m <> sm
RETURN distinct(sm), collect(h.w)
Basically a query like
MATCH (x)-[h]->(y), (a)-[H]->(b)
RETURN h
is returning each result for h
n times
where n
is the number of results for H
. Any way around this?
Upvotes: 1
Views: 3116
Reputation: 3308
I replicated the data model for this question to help answer it.
I then setup a sample dataset using Neo4j's online console: http://console.neo4j.org/?id=dakmi3
Running the following query from your question:
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
RETURN DISTINCT sm, collect(h.weight)
Which results in:
(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.31, 0.12, 0.31, 0.01, 0.31, 0.01]
The issue is that there are duplicate relationships being returned, which results in duplicated weight in the collection. The solution is to use WITH to limit relationships to distinct records and then return the collection of weights of those relationships.
MATCH (m:Movie { title: "The Matrix" })-[h1:HAS_TAG]->(t:Tag),
(t)<-[h2:HAS_TAG]-(sm:Movie),
(m)-[h:HAS_TAG]->(t0:Tag),
(sm)-[H:HAS_TAG]->(t1:Tag)
WHERE m <> sm
WITH DISTINCT sm, h
RETURN sm, collect(h.weight)
(1:Movie {title:"The Matrix: Reloaded"}) [0.31, 0.12, 0.01]
Upvotes: 2
Reputation: 9952
I'm afraid I still don't quite get your intention, but about the general question of duplicate results, that is just the way a disconnected pattern works. Cypher must consider something like
(:A), (:B)
as one pattern, not two. That means that any satisfying graph structure is considered a distinct match. Suppose you have the graph resulting from
CREATE (:A), (:B), (:B)
and query it for the pattern above, you get two results, namely
neo4j-sh (?)$ MATCH (a:A),(b:B) RETURN *;
==> +-------------------------------+
==> | a | b |
==> +-------------------------------+
==> | Node[15204]{} | Node[15207]{} |
==> | Node[15204]{} | Node[15208]{} |
==> +-------------------------------+
==> 2 rows
==> 53 ms
Similarly when matching your pattern (x)-[h]->(y), (a)-[H]->(b)
cypher considers each combination of the two pattern parts to make up a unique match for the one whole pattern–so the results for h
are compounded by the results for H
.
This the way the pattern matching works. To achieve what you want you could first consider if you really need to query for a disconnected pattern. If you do, or if a connected pattern also generates redundant matches, then aggregate one or more of the pattern parts. A simple case might be
CREATE (a:A), (b1:B), (b2:B)
, (c1:C), (c2:C), (c3:C)
, a-[:X]->b1, a-[:X]->b2
, a-[:Y]->c1, a-[:Y]->c2, a-[:Y]->c3
queried with
MATCH (b:B)<-[:X]-(a:A)-[:Y]->(c:C) // with 1 (a), 2 (b) and 3 (c) you get 6 matched paths
RETURN a, collect (b) as bb, collect (c) as cc // after aggregation by (a) there is one path
Sometimes it makes sense to do the aggregation as an intermediate step
MATCH (b)<-[:X]-(a:A) // 2 paths
WITH a, collect(b) as bb // 1 path
MATCH a-[:Y]->(c) // 3 paths
RETURN a, bb, collect(c) as cc // 1 path
Upvotes: 0