Reputation: 1259
I have a time series like. I want to compute lag N only if the date-time is continuous and skip computing lag when I encounter missing data. I don't want to compute lag when previous entry is at more than N hours interval in R
t val
2005-01-17 17:30:00 14.3
2005-01-17 18:30:00 14.0
2005-01-17 19:30:00 14.3
2005-01-17 22:30:00 14.9
2005-01-17 23:30:00 14.2
2005-01-18 00:30:00 14.1
There are missing entry for dates 2005-01-17 20:30:00
2005-01-17 21:30:00
. I want to compute lag N only if the date-time is continuous and skip computing lag when I encounter missing data.
Expected Output Result
t val val_lag val_lag2
2005-01-17 17:30:00 14.3 NA NA
2005-01-17 18:30:00 14.0 14.3 NA
2005-01-17 19:30:00 14.3 14.0 14.3
2005-01-17 22:30:00 14.9 NA NA
2005-01-17 23:30:00 14.2 14.9 NA
2005-01-18 00:30:00 14.1 14.2 14.9
Thanks
Upvotes: 1
Views: 476
Reputation: 887038
We could create a grouping variable by taking the diff
of the 't' column and then get the lag
of 'val'
library(dplyr)
df1 %>%
group_by(grp = cumsum(c(TRUE, diff(t)!=1))) %>%
mutate(val_lag = lag(val)) %>%
ungroup() %>%
select(-grp)
Upvotes: 2