Vagabundo
Vagabundo

Reputation: 48

Postgres Group By preserving the previous order

I have a query like:

SELECT foo_id, bar, timestamp, ROW_NUMBER() OVER (ORDER BY timestamp ASC)
FROM foo_table
WHERE (foo_id = '1' OR related_foo_id = '1')
AND foo_body -> 'type' = 'My_foo_type';

And I get this:

╔═════════╦══════════╦══════════════╦══════════════╦
║ foo_id  ║    bar   ║  timestamp   ║  row_number  ║
╠═════════╬══════════╬══════════════╬══════════════╬
║  1      ║     1    ║      10      ║       1      ║
║  1      ║     1    ║      11      ║       2      ║
║  2      ║     1    ║      15      ║       3      ║
║  1      ║     2    ║      25      ║       4      ║
║  1      ║     2    ║      26      ║       5      ║
╚═════════╩══════════╩══════════════╩══════════════╩

I want to group by 'foo_id' and 'bar' values to get something like this:

╔═════════╦══════════╦══════════════╦══════════════╦
║ foo_id  ║    bar   ║  timestamp   ║  row_number  ║
╠═════════╬══════════╬══════════════╬══════════════╬
║  1      ║     1    ║      10      ║       1      ║
║  2      ║     1    ║      15      ║       2      ║
║  1      ║     2    ║      25      ║       3      ║
╚═════════╩══════════╩══════════════╩══════════════╩

Grouping by foo_id and bar I have to get rid of timestamp column, but I need to order by it. I don't really care if I get the first row or the second one always that (foo_id, bar) is unique in the result.

I tried ordering and grouping the result:

SELECT A.foo_id, A.bar, ROW_NUMBER() OVER ()
FROM (
  SELECT foo_id, bar FROM foo_table
  WHERE (foo_id = '1' OR related_foo_id = '1')
  AND foo_body -> 'type' = 'My_foo_type';
  ORDER BY timestamp ASC) A
GROUP BY foo_id, bar;

But it doesn't respect the order:

╔═════════╦══════════╦══════════════╦══════════════╦
║ foo_id  ║    bar   ║  timestamp   ║  row_number  ║
╠═════════╬══════════╬══════════════╬══════════════╬
║  1      ║     1    ║      10      ║       1      ║
║  1      ║     2    ║      25      ║       2      ║
║  2      ║     1    ║      15      ║       3      ║
╚═════════╩══════════╩══════════════╩══════════════╩

Also tried different joins following the answers in other questions, but I didn't find the right one. I always get the same result of the first select.

Is there a way to get what I want without impacting too much in the performance? Thank you

Upvotes: 1

Views: 288

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270391

I think you want distinct on:

SELECT DISTINCT ON (foo_id, bar) foo_id, bar, timestamp, ROW_NUMBER() OVER (ORDER BY timestamp ASC)
FROM foo_table
WHERE (foo_id = 1 OR related_foo_id = 1) AND
      foo_body -> 'type' = 'My_foo_type'
ORDER BY foo_id, bar, timestamp asc;

This will return one row per foo_id/bar combination -- the one with the lowest timestamp. This is based on the order by and distinct on clauses.

DISTINCT ON is a Postgres extension that is quite handy and can be the most efficient way to execute such a query.

Upvotes: 1

Related Questions