taotsi
taotsi

Reputation: 354

most efficient way to select duplicate rows with max timestamp

Suppose I have a table called t, which is like

id  content  time
1     'a'     100
1     'a'     101
1     'b'     102
2     'c'     200
2     'c'     201

id are duplicate, and for the same id, content could also be duplicate. Now I want to select for each id the rows with max timestamp, which would be

id  content  time
1      'b'    102
2      'c'    201

And this is my current solution:

select t1.id, t1.content, t1.time 
from (
  select id, content, time from t 
) as t1 
right join (
  select id, max(time) as time from t group by id
) as t2 
on t1.id = t2.id and t1.time = t2.time;

But this looks inefficient to me. Because theoretically when select id, max(time) as time from t group by id is executed, the rows I want have already been located. The right join brings extra O(n^2) time cost, which seems unnecessary.

So is there any more efficient way to do it, or anything that I missunderstand?

Upvotes: 0

Views: 2161

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521409

Use DISTINCT ON:

SELECT DISTINCT ON (id) id, content, time
FROM yourTable
ORDER BY id, time DESC;

On Postgres, this is usually the most performant way to write your query, and it should outperform ROW_NUMBER and other approaches.

The following index might speed up this query:

CREATE INDEX idx ON yourTable (id, time DESC, content);

This index, if used, would let Postgres rapidly find, for each id, the record having the latest time. This index also covers the content column.

Upvotes: 1

Nayanish Damania
Nayanish Damania

Reputation: 652

Try this

SELECT a.id, a.content, a.time FROM t AS a
INNER JOIN (
    SELECT a.content, MAX(a.time) AS time FROM t
    GROUP BY a.content
) AS b ON a.content = b.content AND a.time = b.time

Upvotes: 0

Related Questions