Reputation: 744
Basically Mysql: Find rows, where timestamp difference is less than x, but I want to stop at the first value whose timestamp difference is larger than X.
I got so far:
SELECT *
FROM (
SELECT *,
(LEAD(datetime) OVER (ORDER BY datetime)) - datetime AS difference
FROM history
) AS sq
WHERE difference < '00:01:00'
Which seems to correctly return all rows where the difference between the row and the one "behind" it is less than a minute, but that means I still get large jumps in the datetime
s, which I don't want - I want to select the most recent "run" of rows, where a "run" is defined as "the timestamps in datetime
differ by less than a minute".
e.g., I have rows whose hypothetical timestamps are as follows:
24, 22, 21, 19, 18, 12, 11, 9, 7...
And my limit of differences is 3
, i.e. I want the run of the rows whose difference between "timestamps" is less than 3; therefore just:
24, 22, 21, 19, 18
Is this possible in SQL?
Upvotes: 0
Views: 2347
Reputation: 36214
This can be easily solved with recursive CTEs (this will select your rows one-by-one and stops when there is no row in range interval '1 min'
):
with recursive h as (
select * from (
select *
from history
order by history.datetime desc
limit 1
) s
union all
select * from (
select history.*
from h
join history on history.datetime >= h.datetime - interval '1 min'
and history.datetime < h.datetime
order by history.datetime desc
limit 1
) s
)
select * from h
This should be efficient if you have an index on history.datetime
. Though, if you care about performance, you should test it against the window-function based ones. (I personally get a headache when see as much subqueries and window functions as needed for this problem. The irony in my answer is that postgresql does not support the ORDER BY
clause directly inside recrursive CTEs, so I had to use 2 meaningless subqueries to "hide" them).
Upvotes: 0
Reputation: 49260
You can use lag
to get the previous row's timestamp and check if the current row is within 3 minutes of it. Reset the group if the condition fails. After this grouping is done, you have the find the latest such group, use max
to get it. Then get all those rows from the latest group.
Include a partition by
clause in the window functions lag
, sum
andmax
if this has to be done for each id in the table.
with grps as (
select x.*,sum(col) over(order by dt) grp
from (select t.*
--checking if the current row's timestamp is within 3 minutes of the next row
,case WHEN dt BETWEEN LAG(dt) OVER (ORDER BY dt)
AND LAG(dt) OVER (ORDER BY dt) + interval '3 minute' THEN 0 ELSE 1 END col
from t) x
)
select dt
from (select g.*,max(grp) over() maxgrp --getting the latest group
from grps g
) g
where grp = maxgrp
The above would get you the members in the latest group even though it has one row. To avoid such results get the latest group which has more than 1 row.
with grps as (
select x.*,sum(col) over(order by dt) grp
from (select t.*
,case WHEN dt BETWEEN LAG(dt) OVER (ORDER BY dt)
AND LAG(dt) OVER (ORDER BY dt) + 3 THEN 0 ELSE 1 END col
from t) x
)
,grpcnts as (select g.*,count(*) over(partition by grp) grpcnt from grps g)
select dt from (select g.*,max(grp) over() maxgrp
from grpcnts g
where grpcnt > 1
) g
where grp = maxgrp
Upvotes: 1
Reputation: 1269503
You can do this by using a flag based on the lead()
or lag()
values. I believe this does what you want:
SELECT h.*
FROM (SELECT h.*,
SUM( (next_datetime < datetime + interval '1 minute')::int) OVER (ORDER BY datetime DESC) as grp
FROM (SELECT h.*,
LEAD(h.datetime) OVER (ORDER BY h.datetime)) as next_datetime
FROM history h
) h
WHERE next_datetime < datetime + interval '1 hour'
) h
WHERE grp IS NULL OR grp = 0;
Upvotes: 0