Reputation: 1029
I have the following table in a postgres database, with 3 columns:
ReaderId: String
TagId: String
Timestamp: Timestamp
ReaderId TagId Timestamp
A T1 20190101-00:00:00 * ~
A T1 20190101-00:00:00 ~
A T1 20190101-00:00:01
A T1 20190101-00:00:02
B T1 20190101-00:00:03 *
B T1 20190101-00:00:03
B T1 20190101-00:00:04
A T1 20190101-00:00:05 *
A T1 20190101-00:00:06
A T1 20190101-00:00:07
C T1 20190101-00:00:08 *
C T1 20190101-00:00:09
B T2 20190101-00:00:01 *
B T2 20190101-00:00:04
B T2 20190101-00:00:05
C T2 20190101-00:00:06 *
C T2 20190101-00:00:07
B T2 20190101-00:00:07 * ~
B T2 20190101-00:00:07 ~
B T2 20190101-00:00:08
I would like a query/function that, when supplied with a TagId, returns the first row each time that tag is read at a different reader to which it was last read at (or the first row if the tag has never been read before). The rows which are eligible to be returned according to this criteria are highlighted by a * above. If there are multiple rows which are identical and 'equally first' then only one of these should be returned (as in the case of the rows marked with a ~ above).
This function needs to be performant as the volume of data is expected to easily grow into the many millions/low billions of rows. I can create any indexes required.
My SQL is rusty, and was never very good in the first place, so any help here is appreciated!
Upvotes: 0
Views: 876
Reputation: 656331
Use lag()
like others already suggested. But you specified:
when supplied with a
TagId
So you can simplify. A bit faster, too:
SELECT reader_id, tag_id, ts
FROM (
SELECT *, lag(reader_id) OVER (ORDER BY ts) IS DISTINCT FROM reader_id AS pick
FROM tbl
WHERE tag_id = 'T1' -- your tag_id here
) sub
WHERE pick;
db<>fiddle here
Also works for NULL
values in the column reader_id
.
You can wrap this in an SQL function or prepared statement and only pass your tag_id
.
Upvotes: 2
Reputation: 121514
Use the window function lag()
:
select
reader_id, tag_id, timestamp
from (
select
reader_id, tag_id, timestamp,
lag(reader_id) over (partition by tag_id order by timestamp)
from my_table
) s
where lag is distinct from reader_id
order by tag_id, timestamp
Window functions are expensive but alternative solutions (if exist) rather cannot be cheaper. The index on (tag_id, timestamp)
will support the query.
Online demo on db<>fiddle.
Read also about window functions in the documentation.
Upvotes: 1
Reputation: 1269503
Simply use lag()
:
select t.*
from (select t.*,
lag(ReaderId) over (partition by TagId order by Timestamp) as prev_ReaderId
from t
) t
where prev_ReaderId is null or prev_ReaderId <> ReaderId;
In Postgres, you can shorten the where
clause to:
where prev_ReaderId is distinct from ReaderId
Upvotes: 1