captnswing
captnswing

Reputation: 645

select subset of rows

I have the following table

ID  NAME    DATE        VALUE
1   John    2022-02-01  27
2   John    2022-02-01  9
3   John    2022-01-31  -12
4   John    2021-12-15  32
5   Tom     2021-10-22  11
6   Tom     2021-05-05  201
7   Tom     2020-11-27  76
8   Bob     2022-01-07  43
9   Bob     2021-08-26  3
10  Bob     2021-05-13  12
11  Bob     2021-03-08  2

I want to write a query that ignores the rows with the latest DATE for each NAME

ID  NAME    DATE        VALUE
3   John    2022-01-31  -12
4   John    2021-12-15  32
6   Tom     2021-05-05  201
7   Tom     2020-11-27  76
9   Bob     2021-08-26  3
10  Bob     2021-05-13  12
11  Bob     2021-03-08  2

No idea how to write a query like that....

Upvotes: 1

Views: 168

Answers (3)

robodisco
robodisco

Reputation: 4272

When writing a query to answer get all the rows with (or without) the highest X you can often solve it using window functions like Zakaria's answer, DISTINCT ON, or lateral joins.

I personally prefer the expressiveness of window functions but there are times where lateral joins are better as they can be more performant

Here is how to do it with a lateral.

First we need to project a table where for each row in your original table we tie it to the row in it's group (i.e. rows with the same name) with the latest date.

select users.*, latest_dates.* from users
left join lateral (
        select u.* from users as u
  where u.name = users.name AND u.date > users.date
  order by u.date DESC
  limit 1
) as latest_dates on true;



 id | name |        date         | value | id | name |        date         | value
----+------+---------------------+-------+----+------+---------------------+-------
  1 | John | 2022-02-01 00:00:00 |    27 |    |      |                     |
  2 | John | 2022-02-01 00:00:00 |     9 |    |      |                     |
  3 | John | 2022-01-31 00:00:00 |   -12 |  1 | John | 2022-02-01 00:00:00 |    27
  4 | John | 2021-12-15 00:00:00 |    32 |  1 | John | 2022-02-01 00:00:00 |    27
  5 | Tom  | 2021-10-22 00:00:00 |    11 |    |      |                     |
  6 | Tom  | 2021-05-05 00:00:00 |   201 |  5 | Tom  | 2021-10-22 00:00:00 |    11
  7 | Tom  | 2020-11-27 00:00:00 |    76 |  5 | Tom  | 2021-10-22 00:00:00 |    11
  8 | Bob  | 2022-01-07 00:00:00 |    43 |    |      |                     |
  9 | Bob  | 2021-08-26 00:00:00 |     3 |  8 | Bob  | 2022-01-07 00:00:00 |    43
 10 | Bob  | 2021-05-13 00:00:00 |    12 |  8 | Bob  | 2022-01-07 00:00:00 |    43
 11 | Bob  | 2021-03-08 00:00:00 |     2 |  8 | Bob  | 2022-01-07 00:00:00 |    43

Above we can see for rows which are the latest date in the group will have null values on the right hand side. So its simply a case of creating a query to exclude such values.

select users.*, latest_dates.* from users
left join lateral (
    select u.* from users as u
  where u.name = users.name AND u.date > users.date 
  order by u.date DESC
  limit 1
) as latest_dates on true
where latest_dates.date IS NOT NULL;

This gives you the result you are looking for. Here is the fiddle

Upvotes: 1

Zakaria
Zakaria

Reputation: 4786

You can use rank() to rank the rows by descending DATE, then filter out rows with rank = 1 :

select ID, NAME, DATE, VALUE from
(select *,
rank() over(partition by NAME order by DATE desc) r
from table_name) t
where r > 1;

Fiddle

Upvotes: 3

ansev
ansev

Reputation: 30920

We could do:

WITH max_date_table AS
(
SELECT MAX(DATE) AS max_date_value, NAME
FROM mytable
GROUP BY NAME
)
SELECT mytable.ID,  mytable.NAME, mytable.DATE, mytable.VALUE
FROM mytable 
INNER JOIN max_date_table
ON mytable.NAME=max_date_table.NAME
WHERE mytable.DATE != max_date_table.max_date_value

Upvotes: 0

Related Questions