Reputation: 3815
Q: In the tidyverts/fable forecasting framework, with many target time series to forecast, how do I supply a different target transform parameter for each series?
In particular, I'd like to do a Box-Cox transformation of each time series but using a different lambda for each series, e.g., the lambda estimated from the Guerrero method on each series. How do I do this within the framework?
Below are couple attempts of mine. I get errors.
If there's not a way to do this within the framework, is there a good hack I can use? Ideally, one which can still work with hierarchical time series.
Below, I propose a hack. I'm not sure if it would still work for hierarchical time series. I should check. Regardless, I assume there most be a better way to go about things.
library(fpp3)
# construct data in transformed space directly
z1 <- arima.sim(n=104,list(ar=0.9))
z2 <- arima.sim(n=104,list(ma=0.5))
# inverse to get data in the untransformed space
y1 <- fabletools::inv_box_cox(z1, lambda=0.25)
y2 <- fabletools::inv_box_cox(z2, lambda=0.75)
# create tsibble for time series modeling
tibble(idx=1:104, y1=y1, y2=y2) %>%
pivot_longer(cols=c(y1,y2), names_to='series', values_to='value') %>%
tsibble(index=idx, key=series) ->
dat
# estimate optimal box-cox transform lambda for each series using guerrero
# method
dat %>%
fabletools::features(value, features='guerrero') ->
lambdas
# # A tibble: 2 × 2
# series lambda_guerrero
# <chr> <dbl>
# 1 y1 0.0991
# 2 y2 0.751
# set up the optimal lambdas as exogenous regressors?
dat %>% inner_join(lambdas, by=join_by(series)) -> dat.xrg
dat.xrg %>%
model(arima=ARIMA(box_cox(value,lambda=lambda))) ->
fit
# Error in `.g()`:
# ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 104 or 1.
# Run `rlang::last_trace()` to see where the error occurred.
# Try defining lambda outside, and of the length desired?
lambdas %>% pull(lambda_guerrero) %>% rep(each=104) -> lambda
length(lambda)
# [1] 208
dat %>%
model(arima=ARIMA(box_cox(value, lambda=lambda))) ->
fit
# Error in `.g()`:
# ! Response variable transformation has incompatible lengths, all arguments must be the length of the data 208 or 1.
# Run `rlang::last_trace()` to see where the error occurred.
# just going with a tidy-hack
# is this the best one can do?
dat %>%
nest(.by=series) %>%
inner_join(lambdas, by = "series") %>%
mutate(
fit=map2(
data,
lambda_guerrero,
\(.dat,.lambda)
model(
.dat,
arima=ARIMA(box_cox(value, lambda=.lambda))
)
)
) %>%
unnest(cols=fit) %>%
select(series, arima) %>%
as_mable(key='series', model='arima') ->
fit
# looks right
fit
# # A mable: 2 x 2
# # Key: series [2]
# series arima
# <chr> <model>
# 1 y1 <ARIMA(1,0,2)>
# 2 y2 <ARIMA(0,0,1)>
# still get access to all the nice fable tools
fit %>% accuracy()
# # A tibble: 2 × 11
# series .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
# <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 y1 arima Training 0.311 3.97 1.91 -450. 512. 0.956 0.961 -0.123
# 2 y2 arima Training 0.269 1.16 0.918 -884. 1003. 0.879 0.840 0.00342
# can make a nice plot
fit %>%
augment() %>%
ggplot(aes(x=idx, y=value)) +
geom_point() +
geom_line(aes(y=.fitted),color='blue') +
facet_grid(rows=vars(series))
Upvotes: 0
Views: 41
Reputation: 2459
Only a slight error in your first attempt of joining the lambda values, fixing that works:
# The column name for the lambda parameter in dat.xrg is lambda_guerrero, not lambda
dat.xrg %>%
model(arima=ARIMA(box_cox(value,lambda=lambda_guerrero)))
Note however that the detected response variable is now box_cox(value,lambda=lambda_guerrero)
not value
, and so the outputs won't be automatically back-transformed. This is because value
and lambda_guerrero
have the same length, and so the response variable detection algorithm doesn't identify value
as the intended response.
You can be explicit about what variable is the response variable with resp()
:
dat.xrg %>%
model(arima=ARIMA(box_cox(resp(value), lambda=lambda_guerrero))) ->
fit
This approach of providing lambda_guerrero
inside the dataset with the same length as the response allows the transformation parameter to change over time. Accordingly this also requires you to specify future values of lambda_guerrero
when forecasting.
You probably want to instead use a single value of lambda
, if you want it to not change over time. If a length-1 input/variable is used in transforming the response, it will be cached and re-used for forecasting. To have a length-1 variable in the transformation, you could use first(lambda_guerrero)
:
dat.xrg %>%
model(arima=ARIMA(box_cox(value, lambda=first(lambda_guerrero)))) ->
fit
Or perhaps easiest and neatest, you can directly calculate guerrero()
inside the transformation:
library(fpp3)
# construct data in transformed space directly
z1 <- arima.sim(n=104,list(ar=0.9))
z2 <- arima.sim(n=104,list(ma=0.5))
# inverse to get data in the untransformed space
y1 <- fabletools::inv_box_cox(z1, lambda=0.25)
y2 <- fabletools::inv_box_cox(z2, lambda=0.75)
# create tsibble for time series modeling
tibble(idx=1:104, y1=y1, y2=y2) %>%
pivot_longer(cols=c(y1,y2), names_to='series', values_to='value') %>%
tsibble(index=idx, key=series) ->
dat
dat |>
model(arima=ARIMA(box_cox(value, lambda=guerrero(value))))
#> # A mable: 2 x 2
#> # Key: series [2]
#> series arima
#> <chr> <model>
#> 1 y1 <ARIMA(1,0,0)>
#> 2 y2 <ARIMA(0,0,1)>
Created on 2024-05-14 with reprex v2.0.2
Upvotes: 1