user1848018
user1848018

Reputation: 1106

Query unique pair of nodes when pair orders is not important in cypher

I am trying to compare users with according to their common interests in this graph.
I know why the following query produces duplicate pairs but can't think of a good way in cypher to avoid it. Is there any way to do it without looping in cypher?

neo4j-sh (?)$ start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;
==> +-----------------------------------------------+
==> | n.name | other.name | common           | freq |
==> +-----------------------------------------------+
==> | "u1"   | "u2"       | ["f1","f2","f3"] | 3    |
==> | "u2"   | "u1"       | ["f1","f2","f3"] | 3    |
==> | "u1"   | "u3"       | ["f1","f2"]      | 2    |
==> | "u3"   | "u2"       | ["f1","f2"]      | 2    |
==> | "u2"   | "u3"       | ["f1","f2"]      | 2    |
==> | "u3"   | "u1"       | ["f1","f2"]      | 2    |
==> | "u4"   | "u3"       | ["f1"]           | 1    |
==> | "u4"   | "u2"       | ["f1"]           | 1    |
==> | "u4"   | "u1"       | ["f1"]           | 1    |
==> | "u2"   | "u4"       | ["f1"]           | 1    |
==> | "u1"   | "u4"       | ["f1"]           | 1    |
==> | "u3"   | "u4"       | ["f1"]           | 1    |
==> +-----------------------------------------------+ 

Upvotes: 3

Views: 1896

Answers (2)

Peter Neubauer
Peter Neubauer

Reputation: 6331

In order to avoid having duplicates in the form of a--b and b--a, you can exclude one of the combinations in your WHERE clause with

WHERE ID(a) < ID(b)

making your above query

start n=node(*) match p=n-[:LIKES]->item<-[:LIKES]-other where ID(n) < ID(other) return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;

Upvotes: 13

Mohamed Ismail Mansour
Mohamed Ismail Mansour

Reputation: 1053

OK, I see that you use (*) as a start point, which mean to loop through the whole graph and make each node as a start point.. So the output is different, not duplicate as you say..

+-----------------------------------------------+
| n.name | other.name | common           | freq |
+-----------------------------------------------+
| "u2"   | "u1"       | ["f1","f2","f3"] | 3    |

not equal to:

+-----------------------------------------------+
| n.name | other.name | common           | freq |
+-----------------------------------------------+
| "u1"   | "u2"       | ["f1","f2","f3"] | 3    |

So, I see that if you try using an index and set a start point, there won't be any duplicates.

start n=node:someIndex(name='C') match p=n-[:LIKES]->item<-[:LIKES]-other where n <> other return n.name,other.name,collect(item.name) as common, count(*) as freq order by freq desc;

Upvotes: 0

Related Questions