Supply a different transform parameter for each target time series

Question

Q: In the tidyverts/fable forecasting framework, with many target time series to forecast, how do I supply a different target transform parameter for each series?

In particular, I'd like to do a Box-Cox transformation of each time series but using a different lambda for each series, e.g., the lambda estimated from the Guerrero method on each series. How do I do this within the framework?

Below are couple attempts of mine. I get errors.

If there's not a way to do this within the framework, is there a good hack I can use? Ideally, one which can still work with hierarchical time series.

Below, I propose a hack. I'm not sure if it would still work for hierarchical time series. I should check. Regardless, I assume there most be a better way to go about things.

library(fpp3)

# construct data in transformed space directly
z1 <- arima.sim(n=104,list(ar=0.9))
z2 <- arima.sim(n=104,list(ma=0.5))

# inverse to get data in the untransformed space
y1 <- fabletools::inv_box_cox(z1, lambda=0.25)
y2 <- fabletools::inv_box_cox(z2, lambda=0.75)

# create tsibble for time series modeling
tibble(idx=1:104, y1=y1, y2=y2) %>% 
  pivot_longer(cols=c(y1,y2), names_to='series', values_to='value') %>%
  tsibble(index=idx, key=series) ->
  dat

# estimate optimal box-cox transform lambda for each series using guerrero
# method
dat %>% 
  fabletools::features(value, features='guerrero') ->
  lambdas
# # A tibble: 2 × 2
# series lambda_guerrero
#             
# 1 y1              0.0991
# 2 y2              0.751 

# set up the optimal lambdas as exogenous regressors?
dat %>% inner_join(lambdas, by=join_by(series)) -> dat.xrg

dat.xrg %>%
  model(arima=ARIMA(box_cox(value,lambda=lambda))) ->
  fit
# Error in `.g()`:
#   ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 104 or 1.
# Run `rlang::last_trace()` to see where the error occurred.

# Try defining lambda outside, and of the length desired?
lambdas %>% pull(lambda_guerrero) %>% rep(each=104) -> lambda
length(lambda)
# [1] 208

dat %>%
  model(arima=ARIMA(box_cox(value, lambda=lambda))) ->
  fit
# Error in `.g()`:
#   ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 208 or 1.
# Run `rlang::last_trace()` to see where the error occurred.

# just going with a tidy-hack
# is this the best one can do?
dat %>% 
  nest(.by=series) %>% 
  inner_join(lambdas, by = "series") %>% 
  mutate(
    fit=map2(
      data, 
      lambda_guerrero, 
      \(.dat,.lambda) 
      model(
        .dat, 
        arima=ARIMA(box_cox(value, lambda=.lambda))
        )
      )
    ) %>% 
  unnest(cols=fit) %>% 
  select(series, arima) %>% 
  as_mable(key='series', model='arima') ->
  fit

# looks right
fit
# # A mable: 2 x 2
# # Key:     series [2]
# series          arima
#          
#   1 y1     
#   2 y2     

# still get access to all the nice fable tools
fit %>% accuracy()
# # A tibble: 2 × 11
# series .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
#                   
# 1 y1     arima  Training 0.311  3.97 1.91  -450.  512. 0.956 0.961 -0.123  
# 2 y2     arima  Training 0.269  1.16 0.918 -884. 1003. 0.879 0.840  0.00342

# can make a nice plot
fit %>% 
  augment() %>%
  ggplot(aes(x=idx, y=value)) +
  geom_point() +
  geom_line(aes(y=.fitted),color='blue') +
  facet_grid(rows=vars(series))

Supply a different transform parameter for each target time series

Answers (1)

Related Questions