Alex
Alex

Reputation: 2780

Increase speed of batch forecast with purrr

I was reading a blog post on batch forecasting and wanted to increase the speed. I tried using purrr, but only reduced the time by less than half. Below is a reproducible example showing the example from Hyndman's blog post and showing a purrr alternative. How can I reduce this time?

library(forecast)
library(tidyverse)
library(purrr)
#read file
retail <- read.csv("https://robjhyndman.com/data/ausretail.csv",header=FALSE)

# hyndmans loop
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
system.time(
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean
)

#   user  system elapsed 
#  60.14    0.17   61.72 

# purrr try
system.time(
retail_forecast <- retail %>% 
  as_tibble() %>% 
  map(~ts(.,frequency = 7)) %>% 
  map_dfc(~forecast(.,h=h)$mean))

#   user  system elapsed 
#  32.23    0.03   35.32 

Upvotes: 1

Views: 325

Answers (1)

Tung
Tung

Reputation: 28371

You can parallelize purrr functions using the furrr package. Here is an excerpt from the package page

The goal of furrr is to simplify the combination of purrr’s family of mapping functions and future’s parallel processing capabilities. A new set of future_map_*() functions have been defined, and can be used as (hopefully) drop in replacements for the corresponding map_*() function.

The code draws heavily from the implementations of purrr and future.apply

Using furrr I was able to reduce the computing time more than 3 times on my Linux machine

library(forecast)
library(tidyverse)

### read file
retail <- read.csv("https://robjhyndman.com/data/ausretail.csv", header = FALSE)

hyndman's loop

retail <- ts(retail[, -1], f = 12, s = 1982 + 3 / 12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA, nrow = h, ncol = ns)
system.time(
  for (i in 1:ns)
    fcast[, i] <- forecast(retail[, i], h = h)$mean
)

# user  system elapsed 
# 50.592   0.016  50.599
#

purrr try

system.time(
  retail_forecast <- retail %>%
    as_tibble() %>%
    map(~ts(., frequency = 12)) %>%
    map_dfc(~ forecast(., h = h)$mean)
)

# user  system elapsed 
# 50.232   0.000  50.224 
#

furrr try

library(furrr)
#> Loading required package: future
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)
plan(multiprocess)

system.time(
  retail_forecast <- retail %>%
    as_tibble() %>%
    future_map(~ts(., frequency = 12)) %>%
    future_map_dfc(~ forecast(., h = h)$mean)
)

# user  system elapsed 
# 0.172   0.080  14.702
#

Created on 2018-08-01 by the reprex package (v0.2.0.9000).

Upvotes: 2

Related Questions