Luckylukee
Luckylukee

Reputation: 595

Doing lag queries using hive on irregular timeseries

I am trying to get a lag of one my column on an irregular time series. The data would be as follow

time stamp (seconds), temperature
1, 20
4,12
6,13
7,18

the new dataset should be as follow

time stamp (seconds), temperature, lagged_1_temperature
1, 20,0
4,12,0
6,13,0
7,18,13

As seen just the lag for last row is a non zero.

For a typical lag I use bellow hive query inside my spark application.

"select timestamp, value ,lag(value,1) OVER (ORDER BY timestamp) as lagged_1_value"

Can I change above hive query to give me the result I want

Upvotes: 0

Views: 167

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271131

A simple left join might be more efficient:

select t.*,
       coalesce(tprev.value, 0) as prev_value
from t left join
     t tprev
     on tprev.timestmp = t.timestmp - 1;

Upvotes: 0

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49270

You can do this with a case expression.

select t.*,
case when timestmp-coalesce(lag(timestmp,1) over(order by timestmp),0)=1 
then coalesce(lag(temperature,1) over(order by timestmp),0)
else 0 end as lagged_1_termperature
from t

Upvotes: 1

Related Questions