Reputation: 300
I like to work with the forecast package by R.Hyndman in R because it is excellent at finding good forecasting intervals. Unfortunately, I don't know how to increase its performance further when forecasting many timeseries at once.
library(forecast)
library(fpp2)
# result:
# 1. parallel for 4000 fc's: 5.9
# 2. lapply for 4000 fc's: 8.7
# 3. for loop for 4000 fc's: 9.66
repetitions = 4000
performa <- vector(mode = "list", length = 3)
ts_list <- vector(mode = "list", length = repetitions)
for (i in 1:repetitions) {
aust <- window(austourists, start=2005)
ts_list[[i]] <- aust
}
### standard approach
start.time <- Sys.time()
res_fc_list <- vector(mode = "list", length = repetitions)
i<-1
for (ts in ts_list) {
fit <- ets(ts)
res_fc_list[[i]] <- forecast(fit, h = 1, level = 0.68)
i<-i+1
}
end.time <- Sys.time()
performa[1] <- (end.time - start.time)
What I tried so far: I read that lapply should increase the performance. This is true, but the increase is rather small. (like 15%)
### lapply approach
start.time <- Sys.time()
fit_list <- lapply(ts_list, ets)
res_fc_list2 <- lapply(fit_list, function(x) forecast(x, h=1, level = 0.68))
end.time <- Sys.time()
performa[2] <- (end.time - start.time)
I then continued to test if with parallelisation by using the foreach and doParallel package. This lead to a increase in speed of about 50%.
### parallelization
library(foreach)
library(doParallel)
#setup parallel backend to use many processors
cores=detectCores()
cl <- makeCluster(cores[1]-1) #not to overload your computer
registerDoParallel(cl)
strt<-Sys.time()
#loop
ls <-foreach(i = 1:repetitions, .packages = c("forecast")) %dopar% {
fit <- ets(ts_list[[i]])
res <- forecast(fit, h = 1, level = 0.68)
res
}
performa[3] <- (Sys.time()-strt)
stopCluster(cl)
print(performa)
Can anyone give me an example, how to further increase the performance of my code? Many thanks in advance.
Upvotes: 1
Views: 32