Reputation: 1106
I am trying to compare users with according to their common interests in this graph.
I know why the following query produces duplicate pairs but can't think of a good way in cypher to avoid it. Is there any way to do it without looping in cypher?
neo4j-sh (?)$ start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
==> +-----------------------------------------------+
==> | n.name | other.name | common | freq |
==> +-----------------------------------------------+
==> | "u1" | "u2" | ["f1","f2","f3"] | 3 |
==> | "u2" | "u1" | ["f1","f2","f3"] | 3 |
==> | "u1" | "u3" | ["f1","f2"] | 2 |
==> | "u3" | "u2" | ["f1","f2"] | 2 |
==> | "u2" | "u3" | ["f1","f2"] | 2 |
==> | "u3" | "u1" | ["f1","f2"] | 2 |
==> | "u4" | "u3" | ["f1"] | 1 |
==> | "u4" | "u2" | ["f1"] | 1 |
==> | "u4" | "u1" | ["f1"] | 1 |
==> | "u2" | "u4" | ["f1"] | 1 |
==> | "u1" | "u4" | ["f1"] | 1 |
==> | "u3" | "u4" | ["f1"] | 1 |
==> +-----------------------------------------------+
Upvotes: 3
Views: 1896
Reputation: 6331
In order to avoid having duplicates in the form of a--b
and b--a
, you can exclude one of the combinations in your WHERE clause with
WHERE ID(a) < ID(b)
making your above query
start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where ID(n) < ID(other) return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
Upvotes: 13
Reputation: 1053
OK, I see that you use (*) as a start point, which mean to loop through the whole graph and make each node as a start point.. So the output is different, not duplicate as you say..
+-----------------------------------------------+
| n.name | other.name | common | freq |
+-----------------------------------------------+
| "u2" | "u1" | ["f1","f2","f3"] | 3 |
not equal to:
+-----------------------------------------------+
| n.name | other.name | common | freq |
+-----------------------------------------------+
| "u1" | "u2" | ["f1","f2","f3"] | 3 |
So, I see that if you try using an index and set a start point, there won't be any duplicates.
start n=node:someIndex(name='C') match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
Upvotes: 0