xyz83242
xyz83242

Reputation: 814

Grouped LIMIT in PostgreSQL: show the first N rows for each group, BUT only if the first of those row equals specific data

Consider the following table:

SELECT * FROM report_raw_data;
ts         | d_stamp    | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1      | 60
1605464634 | 2020-11-15 | 2      | 54
1605382126 | 2020-11-14 | 1      | 40
1605362085 | 2020-11-14 | 3      | 33
1605355089 | 2020-11-13 | 1      | 60
1605202153 | 2020-11-12 | 2      | 30

What I need is to get the first two rows ordered by ts of each id_mod but only if the d_stamp is the current date (in this case 2020-11-15).

So far I have managed to get the first two rows of each id_mod ordered by ts, but I struggle with the only current date 2020-11-15.

Here is my and wrong result try:

SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS r,t.* FROM 
report_raw_data t) x WHERE x.r <= 2;

ts         | d_stamp    | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1      | 60
1605382126 | 2020-11-14 | 1      | 40
1605464634 | 2020-11-15 | 2      | 54
1605202153 | 2020-11-12 | 2      | 30
1605362085 | 2020-11-14 | 3      | 33

If I use in the query WHERE = '2020-11-15' I will ultimately get only those records (so no second rows) which I need.

This is what I would like to get (ignoring the id_mod number 3) since it's the first row does not start on 2020-11-15:

ts         | d_stamp    | id_mod | value
-----------+------------+--------+------
1605450647 | 2020-11-15 | 1      | 60
1605382126 | 2020-11-14 | 1      | 40
1605464634 | 2020-11-15 | 2      | 54
1605202153 | 2020-11-12 | 2      | 30

One more note: I will need to be able to use LIMIT and OFFSET with the query to be able to paginate through the results on the frontend.

Upvotes: 0

Views: 50

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

Assuming you have no future data, I would suggest:

SELECT rdr.*
FROM (SELECT rdr.*,
             ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS seqnum
      FROM report_raw_data rdr
      WHERE d_stamp = current_date
     ) rdr
WHERE seqnum <= 2;

Filtering based on the time in the subquery should significantly improve performance. And for optimal performance, you want an index on (d_stamp, id_mod, ts desc).

Upvotes: 0

GMB
GMB

Reputation: 222462

Starting from your current query, a simple approach is to use a window MAX() in the subquery to recover the latest ts per id_mod. You can then use that for additional filtering in the outer query.

SELECT * 
FROM (
    SELECT t.*,
        ROW_NUMBER() OVER (PARTITION BY id_mod ORDER BY ts DESC) AS rn,
        MAX(ts)      OVER(PARTITION BY id_mod) max_ts
    FROM report_raw_data t
) x 
WHERE rn <= 2 and max_ts = current_date;

Upvotes: 1

Related Questions