Marta
Marta

Reputation: 3863

Postgresql extract last row for each id

Suppose I've next data

  id    date          another_info
  1     2014-02-01         kjkj
  1     2014-03-11         ajskj
  1     2014-05-13         kgfd
  2     2014-02-01         SADA
  3     2014-02-01         sfdg
  3     2014-06-12         fdsA

I want for each id extract last information:

  id    date          another_info
  1     2014-05-13         kgfd
  2     2014-02-01         SADA
  3     2014-06-12         fdsA

How could I manage that?

Upvotes: 173

Views: 109306

Answers (6)

JanKanis
JanKanis

Reputation: 6712

Another approach is to use JOIN LATERAL. This should be portable and can be more performant than using DISTINCT ON in certain cases:

SELECT tab.*
FROM
    (SELECT DISTINCT id FROM the_table) ids
JOIN LATERAL
    (SELECT * 
     FROM the_table
     WHERE id = ids.id
     ORDER BY date DESC
     LIMIT 1
    ) tab

This can be efficient in Postgres (v 17) under two conditions:

  • You have an index on (id, date DESC, ...)
  • You have an efficient way to get the ids you want.

The latter point can be an issue, because often Postgres will choose to do an expensive sequential scan to get the distinct ids. But if you already get the ids from another query or as query parameters you can use those. E.g. SELECT id FROM unnest(ARRAY[1,2,3]) ids(id). If you have an index on (id) that is significantly smaller than the full table that can also help (though PG will still do a full scan over the index).

Upvotes: 1

Or207
Or207

Reputation: 111

For most scenarios, The most efficient way is to use GROUP BY

I saw the accepted answer which determine that using distinct on (id) id is The most efficient way to solve the problem which was described in the question but I believe it's extremely not accurate. Sadly I couldn't find any helpfull insights from POSTGRES doc' but I did find this article which refference few others and provide examples whereas

GROUP BY approach definitely leads to better performance


We had discussion over this subject at work and did a little experience over a table that holds some data about tags' blinks with 4,114,692 rows, and has indexes over tag_id and over timestamp (seperated indexes)

Here are the queries:

1.using ditinct:

select distinct on (tag_id) tag_id, timestamp, some_data 
from blinks 
order by id, timestamp desc;

2.using CTE + group by + join:

`with blink_last_timestamp as (
     select tag_id, max(timestamp) as max_timestamp
     from blinks 
     group by tag_id )
 select bl.tag_id, max_timestamp, some_data
 from blink_last_timestamp bl 
 join blinks b on 
     b.tag_id = bl.tag_id and 
     bd.timestamp = bl.max_timestamp` 

The results where unambiguous and favored the second solution for this scenario (Which is pretty generic in my opinion),

showing that it is being 10X times (!) faster 1655.991 ms (00:01.656) vs 16723.346 ms (00:16.723) and of course delivered the same data.

Upvotes: 5

Vladislav Stoitsov
Vladislav Stoitsov

Reputation: 97

I found this as the fastest solution:

 SELECT t1.*
   FROM yourTable t1
     LEFT JOIN yourTable t2 ON t2.tag_id = t1.tag_id AND t2.value_time > t1.value_time
  WHERE t2.tag_id IS NULL

Upvotes: 8

user330315
user330315

Reputation:

The most efficient way is to use Postgres' distinct on operator

select distinct on (id) id, date, another_info
from the_table
order by id, date desc;

If you want a solution that works across databases (but is less efficient) you can use a window function:

select id, date, another_info
from (
  select id, date, another_info, 
         row_number() over (partition by id order by date desc) as rn
  from the_table
) t
where rn = 1
order by id;

The solution with a window function is in most cases faster than using a sub-query.

Upvotes: 327

Vivek S.
Vivek S.

Reputation: 21963

select * 
from bar 
where (id,date) in (select id,max(date) from bar group by id)

Tested in PostgreSQL,MySQL

Upvotes: 32

Amal Ts
Amal Ts

Reputation: 861

Group by id and use any aggregate functions to meet the criteria of last record. For example

select  id, max(date), another_info
from the_table
group by id, another_info

Upvotes: -8

Related Questions