user4279562
user4279562

Reputation: 669

return top n results for each query in Neo4j

I've been trying to writhe the following task in cypher query but I am not getting the right results. Other stackoverflow questions discuss limit or collect but I do not think that is enough to do the following task.

Task: I have (p:Product) nodes and between two product nodes there is a relationship called "BOUGHT_TOGETHER". That is

(p:Product)-[b:BOUGHT_TOGETHER]-(q:Product)

And the relationship b has a property called "size" which contains some number. I want to return top 3 results for each product relationship which is ordered by the size. For instance, the query result should look like the following.

+------------------------+
| p.id  | q.id | b.size      |
+------------------------+
   1      2      10
   1      3       8
   1      5       7
   2      21      34
   2      17      20
   2      35      15
   3      5       49
   3      333     30
   3       65      5
   .       .       .
   .       .       .
   .       .       .

Can someone show me how to write a cypher query in order to achieve the desired results? Thank you!

Upvotes: 5

Views: 6326

Answers (3)

Christophe Willemsen
Christophe Willemsen

Reputation: 20175

Another solution is to first order the relationships, pipe them in a collection and UNWIND only the 3 first results of the collection :

MATCH (p:Product)-[r:BOUGHT_TOGETHER]->(:Product)
WITH p, r
ORDER BY r.size DESC 
WITH p, collect(r) AS bts 
UNWIND bts[0..3] AS r
RETURN p.uuid as pid, endNode(r).uuid as qid, r.size as size

Test console here : http://console.neo4j.org/r/r88ijn

NB: After re-reading jjaderberg's answer this is a bit similar, just I think more readable. Why I voted for his answer.

Upvotes: 8

MicTech
MicTech

Reputation: 45003

Cypher has LIMIT and ORDER statements.
http://neo4j.com/docs/stable/query-limit.html
http://neo4j.com/docs/stable/query-order.html

MATCH (p:Product)-[b:BOUGHT_TOGETHER]-(q:Product) 
RETURN p.id, q.id, b.size 
ORDER BY b.size DESC
LIMIT 3;

Upvotes: 3

jjaderberg
jjaderberg

Reputation: 9952

Here's one way to do it (it seems there should be a way to use LIMIT, but I couldn't come up with one just now).

I generated an example graph with

FOREACH (a IN [[1,2,10],[1,3,8],[1,5,7],[2,21,34],[2,17,20],[2,35,15],[3,5,49],[3,333,30],[3,65,5],[1,4,1],[3,6,100]]| MERGE (p:Product { id:a[0]})
     MERGE (q:Product { id:a[1]})
     CREATE p-[b:BOUGHT_TOGETHER { size:a[2]}]->q
)

This is the data from your table of desired output, plus two additional items: [1,4,1] and [3,5,100]. Having more than three relationships for some nodes helps us test that the query gets the correct three–the results for 1 should not contain [1,4,1] and the result for 3 should now contain [3,6,100] instead of [3,5,5].

If this is an accurate sample of your data, then this query should do what you want:

MATCH (p:Product)-[b:BOUGHT_TOGETHER]-(q:Product)
WITH p.id AS pid, q.id AS qid, b.size AS bsize
ORDER BY bsize DESC 
WITH pid, collect([qid, bsize])[..3] AS qb
UNWIND qb AS uqb
RETURN pid, uqb[0] AS qid, uqb[1] AS bsize
ORDER BY pid, bsize DESC

The idea is to order all the result items by b.size, then collect them per p and throw away all but the first three items in each collection, then unwind and return. The results will not look exactly like your output table because it includes the relationships in the other direction as well ([5,1,7] as well as [1,5,7]) but I think that's what you would want anyway.

If this works, you might want to see if you can defer reading off properties until after you have trimmed the collections to save some database hits.

Upvotes: 3

Related Questions