Reputation: 2780
I was reading a blog post on batch forecasting and wanted to increase the speed. I tried using purrr
, but only reduced the time by less than half. Below is a reproducible example showing the example from Hyndman's blog post and showing a purrr
alternative. How can I reduce this time?
library(forecast)
library(tidyverse)
library(purrr)
#read file
retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)
# hyndmans loop
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
system.time(
for(i in 1:ns)
fcast[,i] <- forecast(retail[,i],h=h)$mean
)
# user system elapsed
# 60.14 0.17 61.72
# purrr try
system.time(
retail_forecast <- retail %>%
as_tibble() %>%
map(~ts(.,frequency = 7)) %>%
map_dfc(~forecast(.,h=h)$mean))
# user system elapsed
# 32.23 0.03 35.32
Upvotes: 1
Views: 325
Reputation: 28371
You can parallelize purrr
functions using the furrr
package. Here is an excerpt from the package page
The goal of
furrr
is to simplify the combination ofpurrr
’s family of mapping functions andfuture
’s parallel processing capabilities. A new set offuture_map_*()
functions have been defined, and can be used as (hopefully) drop in replacements for the correspondingmap_*()
function.The code draws heavily from the implementations of
purrr
andfuture.apply
Using furrr
I was able to reduce the computing time more than 3 times on my Linux machine
library(forecast)
library(tidyverse)
### read file
retail <- read.csv("https://robjhyndman.com/data/ausretail.csv", header = FALSE)
retail <- ts(retail[, -1], f = 12, s = 1982 + 3 / 12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA, nrow = h, ncol = ns)
system.time(
for (i in 1:ns)
fcast[, i] <- forecast(retail[, i], h = h)$mean
)
# user system elapsed
# 50.592 0.016 50.599
#
system.time(
retail_forecast <- retail %>%
as_tibble() %>%
map(~ts(., frequency = 12)) %>%
map_dfc(~ forecast(., h = h)$mean)
)
# user system elapsed
# 50.232 0.000 50.224
#
library(furrr)
#> Loading required package: future
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)
plan(multiprocess)
system.time(
retail_forecast <- retail %>%
as_tibble() %>%
future_map(~ts(., frequency = 12)) %>%
future_map_dfc(~ forecast(., h = h)$mean)
)
# user system elapsed
# 0.172 0.080 14.702
#
Created on 2018-08-01 by the reprex package (v0.2.0.9000).
Upvotes: 2