Mark McKelvy
Mark McKelvy

Reputation: 3536

Groupwise select nth row Postgres

I have an problem that falls into the "greatest-n-per-group" category, but with a slight twist. I have a table along the lines of the following:

| t_id | t_amount | b_id | b_amount |
|------|----------|------|----------|
| 1    | 50       | 7    | 50       |
| 1    | 50       | 15   | 50       |
| 1    | 50       | 80   | 50       |
| 3    | 50       | 7    | 50       |
| 3    | 50       | 15   | 50       |
| 3    | 50       | 80   | 50       |
| 17   | 50       | 7    | 50       |
| 17   | 50       | 15   | 50       |
| 17   | 50       | 80   | 50       |

What I'd like to do is essentially partition this table by t_id and then select the first row of the first partition, the second row of the second partition, and the third row of the third partition, with the results looking like this:

| t_id | t_amount | b_id | b_amount |
|------|----------|------|----------|
| 1    | 50       | 7    | 50       |
| 3    | 50       | 15   | 50       |
| 17   | 50       | 80   | 50       |

It seems like a window function or something with distinct on might do the trick, but I haven't yet put it together.

I'm using Postgres 10 on a *nix system.

Upvotes: 2

Views: 480

Answers (1)

S-Man
S-Man

Reputation: 23676

Using window functions dense_rank and row_number would do it

https://www.postgresql.org/docs/10/static/functions-window.html


Solution: db<>fiddle

SELECT
    t_id,
    t_amount,
    b_id,
    b_amount
FROM
    (
        SELECT
            *,
            dense_rank() over (ORDER BY t_id) as group_number,          -- A
            row_number() over (PARTITION BY t_id ORDER BY t_id, b_id) 
                 as row_number_in_group                                 -- B
        FROM
            test_data) s
WHERE
    group_number = row_number_in_group

A dense_rank increases a number per given group (a partition over t_id). So every t_id is getting its own value.

B row_number counts the rows within a given partition.

I illustrate the result of the subquery here:

t_id  t_amount  b_id  b_amount  dense_rank  row_number  
----  --------  ----  --------  ----------  ----------  
1     50        7     50        1           1           
1     50        15    50        1           2           
1     50        80    50        1           3           
3     50        7     50        2           1           
3     50        15    50        2           2           
3     50        80    50        2           3           
17    50        7     50        3           1           
17    50        15    50        3           2           
17    50        80    50        3           3     

Now you have to filter where group number equals row number within the group and you get your expected result.

Upvotes: 4

Related Questions