learn_more
learn_more

Reputation: 197

hive window function - row change in value

I have data which has an id /flag and date fields I need to populate flag_date field in below way

login_date      id      flag    flag_date
5/1/2018        100     N       NULL
5/2/2018        100     N       NULL
5/3/2018        100     Y       5/3/2018
5/4/2018        100     Y       5/3/2018
5/5/2018        100     Y       5/3/2018
5/6/2018        100     N       NULL
5/7/2018        100     N       NULL
5/8/2018        100     Y       5/8/2018
5/9/2018        100     Y       5/8/2018
5/10/2018       100     Y       5/8/2018

When Flag value changes to N to Y, flag_date value changes accordingly. please help.

Upvotes: 0

Views: 148

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49270

select login_date
      ,id
      ,flag
      ,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date
from (select login_date,id,flag
            ,row_number() over(partition by id order by login_date) - 
             row_number() over(partition by id,flag order by login_date) as grp
      from tbl 
     ) t
  • First classify rows into groups, i.e. consecutive 'Y's and 'N's starting a new value when the series is broken. This can be done with a difference of row numbers approach. (Run the inner query to see how group numbers are assigned)
  • Once the groups are assigned, it is trivial to compute flag_date with conditional aggregation.

One more approach to solve this involves generating a new group whenever a 'N' value is encountered. The outer query remains the same, only the inner one changes.

select login_date
      ,id
      ,flag
      ,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date 
from (select login_date,id,flag
            ,sum(case when flag = 'N' then 1 else 0 end) over(partition by id order by login_date) as grp
      from tbl 
     ) t

Upvotes: 1

Related Questions