prarane
prarane

Reputation: 39

hive - replace value of next record based on current value

I have dataset as shown in below imageenter image description here

When is_session_change value is true i am populating new value for session column and i want to carry that session value till the next TRUE value of is_session_change column. For example below image - enter image description here

Column expected_session is the expected value. Please let me know if anyone can help!! Thanks in advance. Actually i am processing the data hourly, and for particular run i am processing current hour data plus last hour data, so last hour data will have session ids attached properly, now i am trying to extend those session ids for eligible records in current hour, if record is not eligible to extend i am generating new session id.

I have tried below approach which have populated values in session column -

select *,case when (session_id != '-1')  then session_id 
              when (new_session = true) then  getSessionId() 
              when (new_session =false AND session_id = '-1') then LAG(session_id) OVER (PARTITION BY uniquevisitor ORDER BY ts) 
        else '-' END as abc 
from mktg_web.web_session_final 
order by uniquevisitor

Upvotes: 0

Views: 269

Answers (1)

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49270

You already have the occurrence column to group the values based on session. Use a case expression to set -1 values to null and use max window function to get the expected_session value.

select w.*,max(case when session <> -1 then session end) 
           over(partition by uniquevisitor,occurrence) as expected_session 
from mktg_web.web_session_final w

Upvotes: 1

Related Questions