itaied
itaied

Reputation: 7107

How to SELECT in SQL based on a value from the same table column?

I have the following table

| id | date       | team |
|----|------------|------|
| 1  | 2019-01-05 | A    |
| 2  | 2019-01-05 | A    |
| 3  | 2019-01-01 | A    |
| 4  | 2019-01-04 | B    |
| 5  | 2019-01-01 | B    |

How can I query the table to receive the most recent values for the teams?

For example, the result for the above table would be ids 1,2,4.

Upvotes: 1

Views: 81

Answers (4)

David Weinberg
David Weinberg

Reputation: 1091

Window function is the best solution for you.

select id
from (
  select team, id, rank() over (partition by team order by date desc) as row_num
  from table
) t
where row_num = 1

That query will return this table:

| id |
|----|
| 1  |
| 2  |
| 4  |

If you to get it one row per team, you need to use array_agg function.

select team, array_agg(id) ids
from (
  select team, id, rank() over (partition by team order by date desc) as row_num
  from table
) t
where row_num = 1
group by team

That query will return this table:

| team |  ids   |
|------|--------|
|  A   | [1, 2] |
|  B   | [4]    |

Upvotes: 0

tonypdmtr
tonypdmtr

Reputation: 3225

One more possibility, generic:

select * from t join (select max(date) date,team from t
                        group by team) tt
  using(date,team)

Upvotes: 0

Hambone
Hambone

Reputation: 16377

If your dataset is large, consider the max analytic function in a subquery:

with cte as (
  select
    id, date, team,
    max (date) over (partition by team) as max_date
  from t
)
select id
from cte
where date = max_date

Notionally, max is O(n), so it should be pretty efficient. I don't pretend to know the actual implementation on PostgreSQL, but my guess is it's O(n).

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269443

In this case, you can use window functions:

select t.*
from (select t.*, rank() over (partition by team order by date desc) as seqnum
      from t
     ) t
where seqnum = 1;

In some databases a correlated subquery is faster with the right indexes (I haven't tested this with Postgres):

select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.team = t.team);

And if you wanted only one row per team, then the canonical answer is:

select distinct on (t.team) t.*
from t
order by t.team, t.date desc;

However, that doesn't work in this case because you want all rows from the most recent date.

Upvotes: 3

Related Questions