saintlyzero
saintlyzero

Reputation: 1842

Find records where length of consecutive values is less than threshold

Here's the table

           timestamp           | tracker_id | position 
-------------------------------+------------+----------
 2020-02-01 16:23:45.571429+00 | 15         |        1
 2020-02-01 16:23:45.857143+00 | 11         |        1
 2020-02-01 16:23:46.428571+00 | 15         |        1
 2020-02-01 16:23:46.714286+00 | 11         |        2
 2020-02-01 16:23:54.714288+00 | 15         |        2
 2020-02-01 16:23:55+00        | 15         |        1
 2020-02-01 16:23:55.285714+00 | 11         |        1
 2020-02-01 16:23:55.571429+00 | 15         |        1
 2020-02-01 16:23:55.857143+00 | 15         |        1
 2020-02-01 16:23:56.428571+00 | 11         |        1
 2020-02-01 16:23:56.714286+00 | 15         |        1
 2020-02-01 16:23:57+00        | 11         |        2
 2020-02-01 16:23:58.142857+00 | 11         |        2
 2020-02-01 16:23:58.428571+00 | 15         |        1
 2020-02-01 16:23:58.714286+00 | 11         |        2
 2020-02-01 16:23:59+00        | 11         |        1
 2020-02-01 16:23:59.285714+00 | 15         |        1
 2020-02-01 16:23:59.295714+00 | 10         |        1
 2020-02-01 16:23:59.305714+00 | 10         |        2
 2020-02-01 16:23:59.385714+00 | 10         |        2
 2020-02-01 16:23:59.485714+00 | 10         |        3

Threshold = 3

Here,
position of tracker_id:15 changes from 1 -> 1 -> 2 -> 1 -> 1 -> 1 -> 2 -> 2 -> 1

position of tracker_id:11 changes from 1 -> 2 -> 1 -> 1 -> 2 -> 2 -> 2 -> 1

position of tracker_id:10 changes from 1 -> 2 -> 2 -> 3

For tracker_id:15 Maximum length of consecutive 2 between 1 is < threshold

For tracker_id:11 Maximum length of consecutive 2 between 1 is = threshold

For tracker_id:10 consecutive 2 are not enclosed between 1

The output should be tracker_id:15 because the length of consecutive 2 in position is less than the threshold

How to do this using a query?

Upvotes: 2

Views: 251

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

There is no need to handle this as a gaps-and-islands problem. Just use window functions:

select tracker_id
from (select t.*,
             min(position) over (partition by tracker_id
                                 order by timestamp
                                 rows between 2 preceding and current row
                                ) as min_pos_3,
             max(position) over (partition by tracker_id
                                 order by timestamp
                                 rows between 2 preceding and current row
                                ) as max_pos_3
      from t
     ) t
group by tracker_id
having count(*) filter (where min_pos_3 = max_pos_3) = 0

This simply looks at the minimum and maximum values over three 3 for each tracker. It returns only rows where the values are always different.

Upvotes: 1

GMB
GMB

Reputation: 222402

This is a gaps and islands problem.

You can start by building groups of adjacent records using the difference between row numbers. Then, you can aggregate each group, and use lag and lead to recover the position of the surrounding groups. The last step is to apply the filtering logic.

select tracker_id
from (
    select
        tracker_id,
        position,
        count(*) cnt,
        lag(position) over(partition by tracker_id order by max(timestamp)) lag_position,
        lead(position) over(partition by tracker_id order by max(timestamp)) lead_position
    from (
        select
            t.*,
            row_number() over(partition by tracker_id order by timestamp) rn1,
            row_number() over(partition by tracker_id, position order by timestamp) rn2
        from mytable t
    ) t
    group by tracker_id, position, rn1 - rn2
) t
where
    position = 2
    and lag_position = 1
    and lead_position = 1
group by tracker_id
having max(cnt) < 3

This demo on DB Fiddle with your sample data produces:

| tracker_id |
| ---------: |
|         15 |

Upvotes: 1

Kohelet
Kohelet

Reputation: 404

I've modified a bit Your input table (added tracker_id=9 for testing).

Window functions can solve this issue, like: row_number(), lead

select x.* 
into #temp1
from
(
    select ' 2020-02-01 16:23:45.571429+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL 
    select ' 2020-02-01 16:23:46.428571+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:24:45.571429+00 ' as time_stamp, 9 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:25:45.857143+00 ' as time_stamp, 9 as tracker_id, 2 as position UNION ALL
    select ' 2020-02-01 16:26:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL
    select ' 2020-02-01 16:27:45.857143+00 ' as time_stamp, 9 as tracker_id, 3 as position UNION ALL
    select ' 2020-02-01 16:28:46.428571+00 ' as time_stamp, 9 as tracker_id, 1 as position UNION ALL
    select ' 2020-02-01 16:23:45.571429+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:45.857143+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:46.428571+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:46.714286+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:54.714288+00 ' as time_stamp, 15 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:55+00        ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.285714+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.571429+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:55.857143+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:56.428571+00 ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:56.714286+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:57+00        ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:58.142857+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:58.428571+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:58.714286+00 ' as time_stamp, 11 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59+00        ' as time_stamp, 11 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.285714+00 ' as time_stamp, 15 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.295714+00 ' as time_stamp, 10 as tracker_id, 1 as position UNION ALL 
    select ' 2020-02-01 16:23:59.305714+00 ' as time_stamp, 10 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59.385714+00 ' as time_stamp, 10 as tracker_id, 2 as position UNION ALL 
    select ' 2020-02-01 16:23:59.485714+00 ' as time_stamp, 10 as tracker_id, 3 as position) x
;

select 
    *,
    ROW_NUMBER() OVER(PARTITION BY tracker_id ORDER BY time_stamp) tracker_id_rownumber,
    case when position=1 then 1 else 0 end is_pos0_equals_1, --is current row position=1?
    case when (LEAD(position, 1) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos1_equals_2, --is next row position=2?
    case when (LEAD(position, 2) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos2_equals_2, --next next row..
    case when (LEAD(position, 3) OVER (PARTITION BY tracker_id ORDER BY time_stamp))=2 then 1 else 0 end is_pos3_equals_2  --next next next row..
into #temp2
from #temp1
;

--leave only trackers with intervals of type {1, ... ,1}
select a.tracker_id, a.tracker_id_rownumber interval_start, min(b.tracker_id_rownumber) interval_end
into #temp3
from #temp2 a
inner join #temp2 b on (a.tracker_id=b.tracker_id and a.tracker_id_rownumber<b.tracker_id_rownumber)
where a.position=1 and b.position=1
group by a.tracker_id, a.tracker_id_rownumber

--check each 3-elements subset (are there any triples of consecutive '2'?) and mark triples of consecutive '2'
select a.*,b.tracker_id tracker_id_,
    case when b.interval_end - b.interval_start>=4 then
        case when (a.is_pos1_equals_2=1 and a.is_pos2_equals_2=1 and a.is_pos3_equals_2=1) then 0 else 1 end
    else
        1
    end 'is_less_than_threshold'
into #temp4
from #temp2 a 
inner join #temp3 b on a.tracker_id=b.tracker_id and a.tracker_id_rownumber between b.interval_start and b.interval_end-1

--output trackers
select a.tracker_id, min(a.is_less_than_threshold) is_ok
from #temp4 a
group by a.tracker_id
having min(a.is_less_than_threshold)=1

Output

tracker_id | is_ok

9 | 1

15 | 1

Upvotes: 1

Related Questions