Reputation: 9
I have a time series training dataset from the period 2010-2016 with the number of observations listed in the table below. I want to perform rolling origin cross-validation in R, where the initial fold uses the observations from 2010 as training and 2011 as testing. The second fold uses daa from 2010 and 2011 as training and 2012 as testing etc.
I have tried different functions such as rolling_origin
and carets trainControl
but sadly it seems only to work with 1 forecast horizon value and 1 skip value. I deeply appreciate any help, especially code example!
2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 |
---|---|---|---|---|---|---|
614 | 617 | 599 | 677 | 881 | 1215 | 1208 |
Upvotes: 0
Views: 293
Reputation: 269674
Let x be the data and y[i] be the year of x[i]. Then calculate u[i] as the indexes of the last data point in the ith unique year and iterate over the indexes of the last points in the training and test sets. In the code below we return the training and test data for each iteration but you can replace the line marked ## with whatever calculation you need.
y <- c(2000, 2000, 2000, 2001, 2001, 2002)
x <- 11:16
u <- unique(findInterval(y, y)) # 3, 5, 6
# input is last index of training and test sets in x
f <- function(itrain, itest) {
train <- x[ seq(1, itrain)]
test <- x[ seq(itrain+1, itest) ]
list(train = train, test = test) ##
}
L <- Map(f, itrain = head(u, -1), itest = tail(u, -1))
names(L) <- y[ u[-1] ]
str(L)
giving this named list where the names are the years of the test set:
List of 2
$ 2001:List of 2
..$ train: int [1:3] 11 12 13
..$ test : int [1:2] 14 15
$ 2002:List of 2
..$ train: int [1:5] 11 12 13 14 15
..$ test : int 16
Upvotes: 0