Reputation: 622
In trying to make a query in which I pull the top 1000 posts (by time_spent
) per specified tag, I came up with the following query, where 1, 2, 3
are the specified tags:
SELECT g.tagid, e.post_id, SUM(e.time_spent) AS time
FROM post_table e
JOIN (SELECT g.postid, g.tagid
FROM tags_table g
WHERE g.tagid IN (1, 2, 3)) g
ON e.post_id = g.postid
WHERE dt >= '2018-06-01'
GROUP BY g.tagid, e.post_id
ORDER BY time DESC
LIMIT 1000
However, the problem with using LIMIT 1000
here is that it limits on the entire group and makes it so I only obtain 1000 results in total instead of obtaining 1000 results for each of tag 1, tag 2, and tag 3 (i.e. 3000 results in total).
How can I modify this query such that the LIMIT
only occurs on the e.post_id
component of the GROUP BY
? Or alternatively, is there another way of obtaining 1000 results per each of the tags specified in the IN
clause?
Upvotes: 0
Views: 33
Reputation: 1271131
Use row_number()
:
SELECT ge.*
FROM (SELECT g.tagid, e.post_id, SUM(e.time_spent) AS time,
ROW_NUMBER() OVER (PARTITION BY g.tagid ORDER BY SUM(e.time_spent) ) as seqnum
FROM post_table e JOIN
tags_table g
ON e.post_id = g.postid
WHERE g.tagid IN (1, 2, 3) AND dt >= '2018-06-01'
GROUP BY g.tagid, e.post_id
) ge
WHERE seqnum <= 1000
ORDER BY t.tagid, time DESC
Upvotes: 1