Are these prediction intervals for bagged ETS models calculated correctly? (coded in R)

Question

it would be greatly appreciated if people here could have a look at this code and try to help with assessing whether these prediction intervals are calculated correctly or what needs to be changed. I tried to produce a 48 step ahead forecast using a bagged ETS model using the forecast.baggedETS() function in R. The data are hourly traffic flows (counts) from New York City. My issue is that my point forecasts seem to be quite good, whereas the simulated prediction intervals seem to be a bit too wide and the PI means diverge from the point forecasts as well (but I suppose they can be asymmetric and they should also be wider than regular confidence intervals). To evaluate the prediction intervals I have computed the PI coverage probability (PICP) which is only a mere 71%, which seems to be quite low. I have also enclosed a graph of the point forecasts and of the PIs. data bagged model + PIs

Do these make sense or what should I change? Thank you very much in advance.

#original series
ts_ori <- ts(df_ori$V2, start = c(1,1), frequency = 24) # 

# train test split
train <- window(ts_ori, start = c(1,1), end = c(21,24))
test <- window(ts_ori, start = c(22,1), end = c(24,1))

L <- BoxCox.lambda(ts_ori)
L # the optimal lambda is 0.9176

# Box-Cox tranformed original series
ts_ori_bc <- BoxCox(x= ts_ori, lambda = L)

train_bc <- window(ts_ori_bc, start = c(1,1), end = c(21,24))
test_bc <- window(ts_ori_bc, start = c(22,1), end = c(24,1))


ets_ZZZ <- ets(y = train, model = "ZZZ")
summary(ets_ZZZ)
forecast(ets_ZZZ) %>% autoplot() + autolayer(test)


# Bergmeirs Bagging ETS function
# the function bld.mbb.bootstrap is used to calculate the bootstrapped series with the 
# Box-Cox and Loess-based decomposition (BLD) bootstrap

#num is the number of bootstrap versions to generate
#blocksize is the size of the MBB 


# ensembling

# the default for the ETS function is the "ZZZ" automated model selection 
# so the bootstrapped ETS models can theoretically choose a different ETS model everytime

# creating 48 step ahead forecasts from a regular automatically chosen ETS model
etsfc <- forecast(ets(train), h=48) 

# the bagged ETS() function implements the bagged model forecasting method from the 
# Bergmeir 2016 paper
# the default for num in the baggedETS() function is 100
baggedfc <- forecast(baggedETS(train), h=48)


# plotting the bagged forecasts against the normal ETS forecasts
# the last two lines before dev.off() remove grid and grey background color

autoplot(train, xlab = "Time", ylab = "Traffic flows") +
        autolayer(baggedfc, series="Bagged.BLD.MBB.ETS", PI=FALSE) +
        autolayer(etsfc, series="ETS", PI=FALSE) +
        guides(colour=guide_legend(title="Point Forecasts")) +
        theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
              panel.background = element_blank(), axis.line = element_line(colour = "black"))


# the bagged ETS model is the average of 100  ETS models of the bootstrapped training data

# gives the point forecasts and the min/max of the 100 ETS models 
summary(baggedfc)

# prediction intervals for the bootstrapped series

# step 1: simulating nsim = 1000 series using the MBB procedure
numsim <- 1000
sim <- bld.mbb.bootstrap(train, numsim)

# step 2: For each of these series, we fit an ETS model and simulate one sample path from that model
# A different ETS model may be selected in each case, although it will most likely select the same 
# model because the series are similar. However, the estimated parameters will be different.

# the simulate() function simulates responses from the distribution of the fitted ets(sim) object

h <- 48
future <- matrix(0, nrow=numsim, ncol=h)
for(i in seq(numsim))
        future[i,] <- simulate(ets(sim[[i]]), nsim=h, seed = 123)



# step 3: we take the means and quantiles of these simulated sample paths to form point 
# forecasts and prediction intervals.

# tsp() gives the start time in time units the end time and the frequency of a time series

start <- tsp(train)[2]+1/24
simfc <- structure(list(
        mean = ts(colMeans(future), start=start, frequency=24),
        lower = ts(apply(future, 2, quantile, prob=0.025),
                   start=start, frequency=24),
        upper = ts(apply(future, 2, quantile, prob=0.975),
                   start=start, frequency=24),
        level=95),
        class="forecast")

# step 4: plotting the bagged ETS model and the prediction intervals

etsfc <- forecast(ets(train), h=48, level=95) 


autoplot(train, ylab = "Traffic flows", xlab = "Time", series = "Training data", color = "black") +
        autolayer(simfc, series="Simulated PIs") +
        autolayer(baggedfc, PI=FALSE, series = "Bagged.BLD.MBB.ETS") +
        guides(colour=guide_legend(title="Series")) +
        theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
                panel.background = element_blank(), axis.line = element_line(colour = "black"))


# PICP

for (i in 1:length(baggedfc$mean)) {
        # creating an indicator variable for every element of the mean column and store the 
        # elements in picp_dummy
        picp_dummy <- ifelse(baggedfc$mean[i] >= simfc$lower & baggedfc$mean[i] <= simfc$upper,1,0)
}
print(picp_dummy)

picp = 1/length(baggedfc$mean) *sum(picp_dummy)

print(picp)
# the PICP for the simulated 95% PI for the bagged ETS model is 0.7083333

# MPIW

length = length(simfc$upper)
length


mpiw = (1/length)*sum(simfc$upper - simfc$lower)
print(mpiw)
# the MPIW of the simulated 95% PI is 15437.7

update

I used my data and ran the modified code from Hyndman's answer. However, I got prediction intervals which are still very wide. In his sample code with the random normal numbers the width of the prediction intervals is roughly in the same ballpark as the variation in the training data. However, when using my data the PI width is extremely large.

New PIs obtained from Hyndman's code

Are these prediction intervals for bagged ETS models calculated correctly? (coded in R)

update

Answers (1)

Related Questions