Reputation: 595
I am trying to get a lag of one my column on an irregular time series. The data would be as follow
time stamp (seconds), temperature
1, 20
4,12
6,13
7,18
the new dataset should be as follow
time stamp (seconds), temperature, lagged_1_temperature
1, 20,0
4,12,0
6,13,0
7,18,13
As seen just the lag for last row is a non zero.
For a typical lag I use bellow hive query inside my spark application.
"select timestamp, value ,lag(
value,1) OVER (ORDER BY timestamp) as lagged_1_value"
Can I change above hive query to give me the result I want
Upvotes: 0
Views: 167
Reputation: 1271131
A simple left join
might be more efficient:
select t.*,
coalesce(tprev.value, 0) as prev_value
from t left join
t tprev
on tprev.timestmp = t.timestmp - 1;
Upvotes: 0
Reputation: 49270
You can do this with a case
expression.
select t.*,
case when timestmp-coalesce(lag(timestmp,1) over(order by timestmp),0)=1
then coalesce(lag(temperature,1) over(order by timestmp),0)
else 0 end as lagged_1_termperature
from t
Upvotes: 1