arik
arik

Reputation: 29330

Postgres Select Row From Group With Opposite Sort Order

I have a Postgres table foobar with the following schema:

id: integer
group: integer
foo: integer
bar: integer
timestamp: integer

I keep track of updates to various groups, and those updates mutate the foo and bar properties. Every time I receive an update, I store the timestamp.

Here's an example value I could have in the databse:

+------+---------+-------+-------+-------------+
| "id" | "group" | "foo" | "bar" | "timestamp" |
+------+---------+-------+-------+-------------+
| 1    | 1       | 10    | 20    | 1           |
| 2    | 1       | 11    | 19    | 2           |
| 3    | 1       | 10    | 20    | 3           |
| 4    | 1       | 10    | 20    | 4           |
+------+---------+-------+-------+-------------+

Oftentimes, the updates I receive are identical. A particularly critical piece of information I'm trying to extract is when I first received the combination of values that is current — but not for the first time ever, but rather the first update after which there haven't been any changes.

A naïve approach would be the following query:

SELECT DISTINCT ON ("group", foo, bar) *
FROM foobar
ORDER BY "group", foo, bar, timestamp DESC;

However, that query would return the last row, which has the latest timestamp. If I switch timestamp to ASC, I would get the very first row, because I have seen the exact foo/bar value combination prior to the update at timestamp 2.

The intuitive thing would have been to simply move the timestamp DESC sort command prior to foo, but Postgres does not allow that. Something like MySQL's HAVING operation could also have come in handy, but Postgres unfortunately doesn't support that.

An incredibly inefficient approach I could take is programmatically iterate through each group, get the latest row, and then fetch all rows in descending timestamp order and stop as soon as I observe a change, but it seems that letting a database do this sort of operation would be wiser.

I am quite certain that I'm missing something obvious, but would greatly appreciate any help. Thanks!

Upvotes: 4

Views: 103

Answers (3)

Zegarek
Zegarek

Reputation: 26322

That's a gaps-and-islands problem.
You can compare each row's (foo,bar) to previous row's using lag(). The window definition lets you only check those coming from the same group, in ascending order.

From that, you can get the latest "changing update" per group with a distinct on() or the 1=row_number()over w, as already described by @Stefanov.sm.
Demo at db<>fiddle:

select distinct on("group") id,"group",foo,bar,"timestamp"
from (select *,coalesce(   foo<>lead(foo)over w1
                        or bar<>lead(bar)over w1,true) is_diff_from_prev
      from foobar
      window w1 as (partition by "group" order by "timestamp" desc))_
where is_diff_from_prev
order by "group","timestamp" desc;
id group foo bar timestamp
3 1 10 20 3

For each group, this starts from the latest record and seeks the first one that changed either value. It takes around 3s on 200k rows with 4k groups, 7% of which have duplicate, non-changing updates among the most recent rows.

Upvotes: 3

Lajos Arpad
Lajos Arpad

Reputation: 76882

A way to solve your issue would be this:

select fb1."group", fb1."foo", fb1."bar", min(fb1."timestamp") as "timestamp"
from foobar fb1
left join foobar fb2
on fb1."group" = fb2."group" and
   fb2."timestamp" > fb1."timestamp" and
   (fb2."foo" <> fb1."foo" or fb2."bar" <> fb1."bar")
left join foobar fb3
on fb1."group" = fb3."group" and
   fb3."timestamp" < fb1."timestamp" and
   fb3."foo" = fb1."foo" and
   fb3."bar" = fb1."bar"
left join foobar fb4
on fb4."group" = fb1."group" and
   fb4."timestamp" < fb1."timestamp" and fb4."timestamp" > fb3."timestamp" and
   (fb4."foo" <> fb1."foo" or fb4."bar" <> fb1."bar")
where fb2."group" is null and (fb3."group" is null or fb4."group" is not null)
group by fb1."group", fb1."foo", fb1."bar"

Explanation: We are searching for the fb1 records for which there is no newer fb2 records with different foo and bar combination, nor older fb3 record with the same combination, or, if such an fb3 exists, then an in-between fb4 also exists with a different combination.

Upvotes: 1

Stefanov.sm
Stefanov.sm

Reputation: 13049

You may resort to the old-school way of doing 'distinct on' selection using row_number window function:

select distinct on ("group", foo, bar) * from 
(
 select *, 
   row_number() over (partition by "group", foo, bar order by "timestamp") rnk
 from foobar
) t
where rnk > 1 -- ignore the initial set of values
order by "group", foo, bar, rnk; -- and pick the earliest record

Demo

Upvotes: 0

Related Questions