Reputation: 572
I try to figure out how to solve this problem in R. I want to use different machine learning regression models on time series data, which is in the area of supervised learning. In that case I need a function / package that allows me to go n-step forward and n-step back, like a sliding window function. The table shows the input (t-n) and output (t+n) variables with the current observation (t) considered an output.
var1(t-1) var2(t-1) var1(t) var2(t) var1(t+1) var2(t+1)
1 4 69 5 70 6 71
2 5 70 6 71 7 72
3 6 71 7 72 8 73
4 7 72 8 73 9 74
5 8 73 9 74 10 75
6 9 74 10 75 11 76
7 10 75 11 76 12 77
8 11 76 12 77 13 78
I already researched about some useful methods such as lag() or the shift() method at r-blogger.com, but at these examples the problem is that missing values will generate.
shift<-function(x,shift_by){
stopifnot(is.numeric(shift_by))
stopifnot(is.numeric(x))
if (length(shift_by)>1)
return(sapply(shift_by,shift, x=x))
out<-NULL
abs_shift_by=abs(shift_by)
if (shift_by > 0 )
out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
else if (shift_by < 0 )
out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by))
else
out<-x
out
}
Result of the shift() function:
x df_lead2 df_lag2
1 5 4 NA
2 6 5 NA
3 7 6 5
4 8 7 6
5 9 8 7
6 10 9 8
7 11 10 9
8 12 11 10
9 13 NA 11
10 14 NA 12
So are there any packages or implemented functions, that allows to receive a dataframe and calculate for each variable the amount of indicates t-n or t+n?
Would be so nice if someone can help me. Thanks!
Upvotes: 1
Views: 587
Reputation: 2070
You might be able to use rollapply (zoo):
rollapply(iris$Sepal.Length, width = 3, by = 2, FUN = mean, align = "left")
You can specify whether you want to compute values (or not) depending if there is a subsequent value (https://rdrr.io/cran/rowr/man/rollApply.html)
Upvotes: 1