KKP
KKP

Reputation: 77

Redshift count consecutive days based on a condition and break(stop counting) when the condition doesn't

Table

Date    Region_1    Region_2
27-Sep  100       97.69
28-Sep  53.84   98.21
29-Sep  88.88   10
30-Sep  8.33    96
01-Oct  25     97.63
02-Oct  20     94.82
03-Oct  25     100
04-Oct  12.5    60.86
05-Oct  10      67.29
06-Oct  42.85   63.85
07-Oct  7.14    66.25
08-Oct  14.28   71.05
09-Oct  16.66   69.48
10-Oct  25      72.64
11-Oct  12.5    0

expected output

Region_1_Compliant_days      Region_2_compliant_Days
1                                       2

I'm trying to count consecutive days for each region only when the percent is greater than 60 %. But I'm not getting the right result.

case when region_1 > 60.0 and count(date) over (partition by date desc) then 1 else 0 end

But the above is calculating all the instances greater than 60.0 and not just the consecutive days.

explanation: region1 on 27th is 100. Hence it is counted as it is greater than 60. but on 28th it is 50. Since it doesn't meet the condition I'll stop counting from there. Same goes with region2

Region2 has a lot of days more than 60. But the reason I'm considering only 2 days is: on 27th and 28th the value is greater than 60 but on 29th it is 10., which means the chain breaks there. So I'm only counting first two days.. even though we have more 60 for the rest of the days post 29th.

Please let me know how can this is achieved.

Thanks

Upvotes: 1

Views: 810

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271151

Assuming the dates are consecutive, you don't need window functions:

select datediff(day,
                min(date),
                min(case when region_1 <= 0.6 then date end)
               ) as region_1,
       datediff(day,
                min(date),
                min(case when region_2 <= 0.6 then date end)
               ) as region_2
from t;

This simply returns the number of days between the first date in the data and the minimum date that does not meet your condition. To me, this seems like the simplest way to answer the question.

If there are no days less than 0.6, then the logic needs a slight tweak:

select datediff(day,
                min(date),
                coalesce(min(case when region_1 <= 0.6 then date end), max(date) + interval '1 day')
               ) as region_1,
       datediff(day,
                min(date),
                coalesce(min(case when region_2 <= 0.6 then date end), max(date) + interval '1 day')
               ) as region_2
from t;

Upvotes: 0

Sujitmohanty30
Sujitmohanty30

Reputation: 3316

Could you try with below query,

select sum(Region_1_Compliant_days) Region_1_Compliant_days
      ,sum(Region_2_Compliant_days) Region_2_Compliant_days
  from
(
select d.*
      ,min(case when region_1 > 60 then 1 else 0 end) over (order by dt rows between unbounded preceding and current row) Region_1_Compliant_days
      ,min(case when region_2 > 60 then 1 else 0 end) over (order by dt rows between unbounded preceding and current row) Region_2_Compliant_days
from table1 d
) t;

Using window clause we always look for a window between all previous rows and current row and if we found min as 0 which means there is a value < 60 and then set to 0 for all afterwards

Hope I make sense and it works as expected.

Upvotes: 3

Related Questions