Reputation: 191
I am using the great fable
package and am trying to create a hierarchical forecast using arima and ets models, and reconciling with td, mo, bu, and min trace to compare and see what is the best approach. My series has some effects late in the series that need to be regressed away and so I am trying to create a binary regressor to deal with that. I have read link1 and link2 about using the new_data
argument to add a regressor with a hierarchical forecast, instead of the xreg
argument which I've used for non-hierarchical forecast. I've had success with this approach by splitting the data into train and test sets and passing the test to new_data
as Rob Hyndman describes in link1. The problem I am having with this current task is that the effects that need to be modeled away are all late in the series and so they are all in the test set.
First here is my reproducible example data:
library(tidyverse)
library(forecast)
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
data <- aus_livestock %>%
filter(State %in% c("Tasmania", "New South Wales", "Queensland"),
as.Date(Month) > as.Date("2000-01-01")) %>%
aggregate_key(State, Count=sum(Count)) %>%
mutate(xreg=as.factor(if_else(as.Date(Month) > as.Date("2018-01-01") &
as.Date(Month) < as.Date("2018-10-01"), 1, 0)))
I have had success in the past doing something like this:
train <- data %>%
filter(as.Date(Month) < as.Date("2017-10-01"))
test <- data %>%
filter(as.Date(Month) >= as.Date("2017-10-01"))
mod_data <- train %>%
model(ets=ETS(Count),
arima=ARIMA(Count ~ xreg)
) %>%
reconcile(bu_ets=bottom_up(ets),
td_ets=top_down(ets),
mint_ets=min_trace(ets),
bu_arima=bottom_up(arima),
td_arima=top_down(arima),
mint_arima=min_trace(arima)
)
forc_data <- mod_data %>%
forecast(new_data=test)
autoplot(forc_data,
data,
level=NULL)
But since in this case the regressor is all zeros in the train set this expectedly provides the error Provided exogenous regressors are rank deficient, removing regressors: xreg1
. I think what I need to do is feed all the data I have to the model, not split the data into train and test, but I am unsure how to forecast that model using fable when there is no data for the new_data
file. The closest I've gotten is something like this:
dates <- sort(rep(seq(as.Date("2019-01-01"), as.Date("2020-12-01"), "months"), 3))
future_data <- tibble(
Month=dates,
State=rep(c("Tasmania", "New South Wales", "Queensland"), 24),
Count=0
) %>%
mutate(Month=yearmonth(Month)) %>%
as_tsibble(index=Month, key=State) %>%
aggregate_key(State, Count=sum(Count)) %>%
mutate(xreg=factor(0, levels=c(0, 1))) %>%
select(-Count)
mod_data <- data %>%
model(ets=ETS(Count),
arima=ARIMA(Count ~ xreg)
) %>%
reconcile(bu_ets=bottom_up(ets),
td_ets=top_down(ets),
mint_ets=min_trace(ets),
bu_arima=bottom_up(arima),
td_arima=top_down(arima),
mint_arima=min_trace(arima)
)
forc_data <- mod_data %>%
forecast(new_data=future_data)
autoplot(forc_data,
data,
level=NULL)
Oddly this code causes my R Studio to crash when I try to run the forecast piece saying R session aborted R has encountered a fatal error
. I think this may be unrelated to the code because I actually got this to work on my real data but the forecasts dont look like I would expect.
So, in summary I would like to know how I can use fable
to forecast a hierarchical series with an exogenous regressor when all the regression effects need to happen in the period of the test set.
Thanks in advance for any help I can get!
Upvotes: 1
Views: 425
Reputation: 21
I think it's not possible to do it only in the test set because then the model has nothing to learn from in the train set. I.e. you can only include an exogenous variable in the training process if it is both present in the train and test set.
Upvotes: 0