Tomáš M.
Tomáš M.

Reputation: 744

SQL SELECT rows where the difference between consecutive columns is less than X

Basically Mysql: Find rows, where timestamp difference is less than x, but I want to stop at the first value whose timestamp difference is larger than X.

I got so far:

SELECT * 
FROM (
  SELECT *, 
         (LEAD(datetime) OVER (ORDER BY datetime)) - datetime AS difference 
  FROM history
) AS sq 
WHERE difference < '00:01:00'

Which seems to correctly return all rows where the difference between the row and the one "behind" it is less than a minute, but that means I still get large jumps in the datetimes, which I don't want - I want to select the most recent "run" of rows, where a "run" is defined as "the timestamps in datetime differ by less than a minute".

e.g., I have rows whose hypothetical timestamps are as follows:

24, 22, 21, 19, 18, 12, 11, 9, 7...

And my limit of differences is 3, i.e. I want the run of the rows whose difference between "timestamps" is less than 3; therefore just:

24, 22, 21, 19, 18

Is this possible in SQL?

Upvotes: 0

Views: 2347

Answers (3)

pozs
pozs

Reputation: 36214

This can be easily solved with recursive CTEs (this will select your rows one-by-one and stops when there is no row in range interval '1 min'):

with recursive h as (
    select * from (
      select   *
      from     history
      order by history.datetime desc
      limit    1
    ) s
  union all
    select * from (
      select   history.*
      from     h
      join     history on history.datetime >= h.datetime - interval '1 min'
      and      history.datetime < h.datetime
      order by history.datetime desc
      limit    1
    ) s
)
select * from h

This should be efficient if you have an index on history.datetime. Though, if you care about performance, you should test it against the window-function based ones. (I personally get a headache when see as much subqueries and window functions as needed for this problem. The irony in my answer is that postgresql does not support the ORDER BY clause directly inside recrursive CTEs, so I had to use 2 meaningless subqueries to "hide" them).

rextester

Upvotes: 0

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

You can use lag to get the previous row's timestamp and check if the current row is within 3 minutes of it. Reset the group if the condition fails. After this grouping is done, you have the find the latest such group, use max to get it. Then get all those rows from the latest group.

Include a partition by clause in the window functions lag, sum andmax if this has to be done for each id in the table.

with grps as (
select x.*,sum(col) over(order by dt) grp
from (select t.*
      --checking if the current row's timestamp is within 3 minutes of the next row
      ,case WHEN dt BETWEEN LAG(dt) OVER (ORDER BY dt) 
       AND LAG(dt) OVER (ORDER BY dt) + interval '3 minute' THEN 0 ELSE 1 END col
      from t) x
)
select dt 
from (select g.*,max(grp) over() maxgrp --getting the latest group
      from grps g
     ) g
where grp = maxgrp

The above would get you the members in the latest group even though it has one row. To avoid such results get the latest group which has more than 1 row.

with grps as (
select x.*,sum(col) over(order by dt) grp
from (select t.*
      ,case WHEN dt BETWEEN LAG(dt) OVER (ORDER BY dt) 
       AND LAG(dt) OVER (ORDER BY dt) + 3 THEN 0 ELSE 1 END col
      from t) x
)
,grpcnts as (select g.*,count(*) over(partition by grp) grpcnt from grps g)
select dt from (select g.*,max(grp) over() maxgrp 
                from grpcnts g 
                where grpcnt > 1
               ) g 
where grp = maxgrp

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269503

You can do this by using a flag based on the lead() or lag() values. I believe this does what you want:

SELECT h.*
FROM (SELECT h.*,
             SUM( (next_datetime < datetime + interval '1 minute')::int) OVER (ORDER BY datetime DESC) as grp
      FROM (SELECT h.*,
                   LEAD(h.datetime) OVER (ORDER BY h.datetime)) as next_datetime
            FROM history h
           ) h
      WHERE next_datetime < datetime + interval '1 hour'
     ) h
WHERE grp IS NULL OR grp = 0;

Upvotes: 0

Related Questions