Reputation: 3863
Suppose I've next data
id date another_info
1 2014-02-01 kjkj
1 2014-03-11 ajskj
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-02-01 sfdg
3 2014-06-12 fdsA
I want for each id extract last information:
id date another_info
1 2014-05-13 kgfd
2 2014-02-01 SADA
3 2014-06-12 fdsA
How could I manage that?
Upvotes: 173
Views: 109306
Reputation: 6712
Another approach is to use JOIN LATERAL. This should be portable and can be more performant than using DISTINCT ON in certain cases:
SELECT tab.*
FROM
(SELECT DISTINCT id FROM the_table) ids
JOIN LATERAL
(SELECT *
FROM the_table
WHERE id = ids.id
ORDER BY date DESC
LIMIT 1
) tab
This can be efficient in Postgres (v 17) under two conditions:
The latter point can be an issue, because often Postgres will choose to do an expensive sequential scan to get the distinct ids. But if you already get the ids from another query or as query parameters you can use those. E.g. SELECT id FROM unnest(ARRAY[1,2,3]) ids(id)
. If you have an index on (id) that is significantly smaller than the full table that can also help (though PG will still do a full scan over the index).
Upvotes: 1
Reputation: 111
For most scenarios, The most efficient way is to use GROUP BY
I saw the accepted answer which determine that using distinct on (id) id
is The most efficient way to solve the problem which was described in the question but I believe it's extremely not accurate.
Sadly I couldn't find any helpfull insights from POSTGRES doc' but I did find this article which refference few others and provide examples whereas
GROUP BY approach definitely leads to better performance
We had discussion over this subject at work and did a little experience over a table that holds some data about tags' blinks with 4,114,692 rows, and has indexes over tag_id and over timestamp (seperated indexes)
Here are the queries:
1.using ditinct:
select distinct on (tag_id) tag_id, timestamp, some_data
from blinks
order by id, timestamp desc;
2.using CTE + group by + join:
`with blink_last_timestamp as (
select tag_id, max(timestamp) as max_timestamp
from blinks
group by tag_id )
select bl.tag_id, max_timestamp, some_data
from blink_last_timestamp bl
join blinks b on
b.tag_id = bl.tag_id and
bd.timestamp = bl.max_timestamp`
The results where unambiguous and favored the second solution for this scenario (Which is pretty generic in my opinion),
showing that it is being 10X times (!) faster 1655.991 ms (00:01.656) vs 16723.346 ms (00:16.723) and of course delivered the same data.
Upvotes: 5
Reputation: 97
I found this as the fastest solution:
SELECT t1.*
FROM yourTable t1
LEFT JOIN yourTable t2 ON t2.tag_id = t1.tag_id AND t2.value_time > t1.value_time
WHERE t2.tag_id IS NULL
Upvotes: 8
Reputation:
The most efficient way is to use Postgres' distinct on
operator
select distinct on (id) id, date, another_info
from the_table
order by id, date desc;
If you want a solution that works across databases (but is less efficient) you can use a window function:
select id, date, another_info
from (
select id, date, another_info,
row_number() over (partition by id order by date desc) as rn
from the_table
) t
where rn = 1
order by id;
The solution with a window function is in most cases faster than using a sub-query.
Upvotes: 327
Reputation: 21963
select *
from bar
where (id,date) in (select id,max(date) from bar group by id)
Tested in PostgreSQL,MySQL
Upvotes: 32
Reputation: 861
Group by id and use any aggregate functions to meet the criteria of last record. For example
select id, max(date), another_info
from the_table
group by id, another_info
Upvotes: -8