Phani
Phani

Reputation: 19

find the consecutive values in impala

I have a data set below with ID, Date and Value. I want to flag the ID where three consecutive days has value 0.

id date value
1 8/10/2021 1
1 8/11/2021 0
1 8/12/2021 0
1 8/13/2021 0
1 8/14/2021 5
2 8/10/2021 2
2 8/11/2021 3
2 8/12/2021 0
2 8/13/2021 0
2 8/14/2021 6
3 8/10/2021 3
3 8/11/2021 4
3 8/12/2021 0
3 8/13/2021 0
3 8/14/2021 0

output

id date value Flag
1 8/10/2021 1 Y
1 8/11/2021 0 Y
1 8/12/2021 0 Y
1 8/13/2021 0 Y
1 8/14/2021 5 Y
2 8/10/2021 2 N
2 8/11/2021 3 N
2 8/12/2021 0 N
2 8/13/2021 0 N
2 8/14/2021 6 N
3 8/10/2021 3 Y
3 8/11/2021 4 Y
3 8/12/2021 0 Y
3 8/13/2021 0 Y
3 8/14/2021 0 Y

Thank you.

Upvotes: 0

Views: 264

Answers (2)

leftjoin
leftjoin

Reputation: 38325

Using window count() function you can count 0's in the frame [current row, 2 following] (ordered by date) - three consecutive rows frame calculated for each row:

count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt.

If count happens to equal 3 then it means 3 consecutive 0's found, case expression produces Y for each row with cnt=3 : case when cnt=3 then 'Y' else 'N' end.

To propagate 'Y' flag to the whole id group use max(...) over (partition by id)

Demo with your data example (tested on Hive):

with mydata as (--Data example, dates converted to sortable format yyyy-MM-dd
select 1 id,'2021-08-10' date_, 1 value union all
select 1,'2021-08-11',0 union all
select 1,'2021-08-12',0 union all
select 1,'2021-08-13',0 union all
select 1,'2021-08-14',5 union all
select 2,'2021-08-10',2 union all
select 2,'2021-08-11',3 union all
select 2,'2021-08-12',0 union all
select 2,'2021-08-13',0 union all
select 2,'2021-08-14',6 union all
select 3,'2021-08-10',3 union all
select 3,'2021-08-11',4 union all
select 3,'2021-08-12',0 union all
select 3,'2021-08-13',0 union all
select 3,'2021-08-14',0
) --End of data example, use your table instead of this CTE

select id, date_, value, 
       max(case when cnt=3 then 'Y' else 'N' end) over (partition by id) flag
from
(
select id, date_, value, 
 count(case when value=0 then 1 else null end) over(partition by id order by date_ rows between current row and 2 following ) cnt
from mydata
)s
  order by id, date_  --remove ordering if not necessary
                      --added it to get result in the same order

Result:

id  date_       value   flag    
1   2021-08-10  1       Y
1   2021-08-11  0       Y
1   2021-08-12  0       Y
1   2021-08-13  0       Y
1   2021-08-14  5       Y
2   2021-08-10  2       N
2   2021-08-11  3       N
2   2021-08-12  0       N
2   2021-08-13  0       N
2   2021-08-14  6       N
3   2021-08-10  3       Y
3   2021-08-11  4       Y
3   2021-08-12  0       Y
3   2021-08-13  0       Y
3   2021-08-14  0       Y

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

You can identify the ids by comparing lag()s. Then spread the value across all rows. The following gets the flag on the third 0:

select t.*,
       (case when value = 0 and prev_value_date_2 = prev_date_2
             then 'Y' else 'N'
        end) as flag_on_row
from (select t.*,
             lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
             lag(date, 2) over (partition by id order by date) as prev_date_2
      from t
     ) t;

The above logic uses lag() so it is easy to extend to longer streaks of 0s. The "2" is looking two rows behind, so if the lagged values are the same, then there are three rows in a row with the same value.

And to spread the value:

select t.*, max(flag_on_row) over (partition by id) as flag
from (select t.*,
             (case when value = 0 and prev_value_date_2 = prev_date_2
                   then 'Y' else 'N'
              end) as flag_on_row
      from (select t.*,
                   lag(date, 2) over (partition by value, id order by date) as prev_value_date_2,
                   lag(date, 2) over (partition by id order by date) as prev_date_2
            from t
           ) t
     ) t;

Upvotes: 0

Related Questions