lowndrul
lowndrul

Reputation: 3815

Supply a different transform parameter for each target time series

Q: In the tidyverts/fable forecasting framework, with many target time series to forecast, how do I supply a different target transform parameter for each series?

In particular, I'd like to do a Box-Cox transformation of each time series but using a different lambda for each series, e.g., the lambda estimated from the Guerrero method on each series. How do I do this within the framework?

Below are couple attempts of mine. I get errors.

If there's not a way to do this within the framework, is there a good hack I can use? Ideally, one which can still work with hierarchical time series.

Below, I propose a hack. I'm not sure if it would still work for hierarchical time series. I should check. Regardless, I assume there most be a better way to go about things.

library(fpp3)

# construct data in transformed space directly
z1 <- arima.sim(n=104,list(ar=0.9))
z2 <- arima.sim(n=104,list(ma=0.5))

# inverse to get data in the untransformed space
y1 <- fabletools::inv_box_cox(z1, lambda=0.25)
y2 <- fabletools::inv_box_cox(z2, lambda=0.75)

# create tsibble for time series modeling
tibble(idx=1:104, y1=y1, y2=y2) %>% 
  pivot_longer(cols=c(y1,y2), names_to='series', values_to='value') %>%
  tsibble(index=idx, key=series) ->
  dat

# estimate optimal box-cox transform lambda for each series using guerrero
# method
dat %>% 
  fabletools::features(value, features='guerrero') ->
  lambdas
# # A tibble: 2 × 2
# series lambda_guerrero
# <chr>            <dbl>
# 1 y1              0.0991
# 2 y2              0.751 

# set up the optimal lambdas as exogenous regressors?
dat %>% inner_join(lambdas, by=join_by(series)) -> dat.xrg

dat.xrg %>%
  model(arima=ARIMA(box_cox(value,lambda=lambda))) ->
  fit
# Error in `.g()`:
#   ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 104 or 1.
# Run `rlang::last_trace()` to see where the error occurred.

# Try defining lambda outside, and of the length desired?
lambdas %>% pull(lambda_guerrero) %>% rep(each=104) -> lambda
length(lambda)
# [1] 208

dat %>%
  model(arima=ARIMA(box_cox(value, lambda=lambda))) ->
  fit
# Error in `.g()`:
#   ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 208 or 1.
# Run `rlang::last_trace()` to see where the error occurred.

# just going with a tidy-hack
# is this the best one can do?
dat %>% 
  nest(.by=series) %>% 
  inner_join(lambdas, by = "series") %>% 
  mutate(
    fit=map2(
      data, 
      lambda_guerrero, 
      \(.dat,.lambda) 
      model(
        .dat, 
        arima=ARIMA(box_cox(value, lambda=.lambda))
        )
      )
    ) %>% 
  unnest(cols=fit) %>% 
  select(series, arima) %>% 
  as_mable(key='series', model='arima') ->
  fit

# looks right
fit
# # A mable: 2 x 2
# # Key:     series [2]
# series          arima
# <chr>         <model>
#   1 y1     <ARIMA(1,0,2)>
#   2 y2     <ARIMA(0,0,1)>

# still get access to all the nice fable tools
fit %>% accuracy()
# # A tibble: 2 × 11
# series .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
# <chr>  <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
# 1 y1     arima  Training 0.311  3.97 1.91  -450.  512. 0.956 0.961 -0.123  
# 2 y2     arima  Training 0.269  1.16 0.918 -884. 1003. 0.879 0.840  0.00342

# can make a nice plot
fit %>% 
  augment() %>%
  ggplot(aes(x=idx, y=value)) +
  geom_point() +
  geom_line(aes(y=.fitted),color='blue') +
  facet_grid(rows=vars(series))

my nice plot

Upvotes: 0

Views: 41

Answers (1)

Mitchell O&#39;Hara-Wild
Mitchell O&#39;Hara-Wild

Reputation: 2459

Only a slight error in your first attempt of joining the lambda values, fixing that works:

# The column name for the lambda parameter in dat.xrg is lambda_guerrero, not lambda
dat.xrg %>%
  model(arima=ARIMA(box_cox(value,lambda=lambda_guerrero))) 

Note however that the detected response variable is now box_cox(value,lambda=lambda_guerrero) not value, and so the outputs won't be automatically back-transformed. This is because value and lambda_guerrero have the same length, and so the response variable detection algorithm doesn't identify value as the intended response.

You can be explicit about what variable is the response variable with resp():

dat.xrg %>%
  model(arima=ARIMA(box_cox(resp(value), lambda=lambda_guerrero))) ->
  fit

This approach of providing lambda_guerrero inside the dataset with the same length as the response allows the transformation parameter to change over time. Accordingly this also requires you to specify future values of lambda_guerrero when forecasting.

You probably want to instead use a single value of lambda, if you want it to not change over time. If a length-1 input/variable is used in transforming the response, it will be cached and re-used for forecasting. To have a length-1 variable in the transformation, you could use first(lambda_guerrero):

dat.xrg %>%
  model(arima=ARIMA(box_cox(value, lambda=first(lambda_guerrero)))) ->
  fit

Or perhaps easiest and neatest, you can directly calculate guerrero() inside the transformation:

library(fpp3)

# construct data in transformed space directly
z1 <- arima.sim(n=104,list(ar=0.9))
z2 <- arima.sim(n=104,list(ma=0.5))

# inverse to get data in the untransformed space
y1 <- fabletools::inv_box_cox(z1, lambda=0.25)
y2 <- fabletools::inv_box_cox(z2, lambda=0.75)

# create tsibble for time series modeling
tibble(idx=1:104, y1=y1, y2=y2) %>% 
  pivot_longer(cols=c(y1,y2), names_to='series', values_to='value') %>%
  tsibble(index=idx, key=series) ->
  dat

dat |> 
  model(arima=ARIMA(box_cox(value, lambda=guerrero(value))))
#> # A mable: 2 x 2
#> # Key:     series [2]
#>   series          arima
#>   <chr>         <model>
#> 1 y1     <ARIMA(1,0,0)>
#> 2 y2     <ARIMA(0,0,1)>

Created on 2024-05-14 with reprex v2.0.2

Upvotes: 1

Related Questions