Saïd Maanan
Saïd Maanan

Reputation: 687

Prophet forecasts exceed the capacity

I built a prophet model that will allow forecasting the annual coverage of 4G mobile network.

> data
          ds    y    Expln
1 2012-08-13 59.0  1.00000
2 2013-08-13 59.0 21.79000
3 2014-08-13 59.0 59.10000
4 2015-08-13 65.0 71.00000
5 2016-08-13 65.0 79.26572
6 2017-08-13 67.0 85.57209
7 2018-08-13 79.0 89.87105
8 2019-08-13 82.0 92.98643
9 2020-08-13 97.8 95.98289
> pred_data
           ds     Expln
1  2012-08-13   1.00000
2  2013-08-13  21.79000
3  2014-08-13  59.10000
4  2015-08-13  71.00000
5  2016-08-13  79.26572
6  2017-08-13  85.57209
7  2018-08-13  89.87105
8  2019-08-13  92.98643
9  2020-08-13  95.98289
10 2021-08-13  97.51558
11 2022-08-13  98.36271
12 2023-08-13  98.85595
13 2024-08-13  99.14414
14 2025-08-13 100.00000
15 2026-08-13 100.00000
16 2027-08-13 100.00000
17 2028-08-13 100.00000
18 2029-08-13 100.00000
19 2030-08-13 100.00000

Obviously the coverage cannot exceed 100%. But when I build the following model:

make_model = function(df) {
  df$cap <- 100
  m <- prophet(growth = 'logistic')
  m <- add_regressor(m, name = 'Expln')
  m <- fit.prophet(m, df)
  return(m)
}

make_frcst = function(m, df) {
  future <- make_future_dataframe(m, 10, freq = 'year', include_history = TRUE)
  future$Expln <- df$Expln
  future$cap <- 100
  fcst <- predict(m, future)
  return(fcst)
}

> model = make_model(data)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 6
> frcst = make_frcst(model, pred_data)

And plot the results, it gives this:

> plot(model, frcst)
> 

Forecats

Why did the forecast exceed the 100 mark although I made the cap to be 100? The regressor also stops at 100. Can someone help me fix it?

Edit:

This is the dput() of the data I used:

> dput(data)
structure(list(ds = structure(c(15565, 15930, 16295, 16660, 17026, 
17391, 17756, 18121, 18487), class = "Date"), y = c(59, 59, 59, 
65, 65, 67, 79, 82, 97.8), Expln = c(1, 21.79, 59.1, 71, 79.26572496, 
85.57208641, 89.87104819, 92.98642948, 95.98289404)), class = "data.frame", row.names = c(NA, 
-9L))

> dput(pred_data)
structure(list(ds = structure(c(15565, 15930, 16295, 16660, 17026, 
17391, 17756, 18121, 18487, 18852, 19217, 19582, 19948, 20313, 
20678, 21043, 21409, 21774, 22139), class = "Date"), Expln = c(1, 
21.79, 59.1, 71, 79.26572496, 85.57208641, 89.87104819, 92.98642948, 
95.98289404, 97.51557771, 98.36271286, 98.85595395, 99.14414047, 
100, 100, 100, 100, 100, 100)), class = "data.frame", row.names = c(NA, 
-19L))

Upvotes: 0

Views: 539

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50678

On revisiting this post, here is what I would do:

  • Use rstanarm to fit a Beta regression model of the form y ~ year + Expln to the data
  • To forecast, we need values for year as well as Expln. While choosing future values for year is easy, you need to think harder about future values for Expln. This would also be necessary when you do forecasting in prophet with external regressors. In this example and for demonstration purposes, I simply duplicate the same set of Expln values from the past 9 years.

Let's start by loading necessary libraries

library(rstanarm)
library(lubridate)
library(tidyverse)
library(tidybayes)

We then create a new year column, since your data seems to be yearly data. We also convert percentages to fractions [0, 1].

data <- data %>% mutate(year = year(as.Date(ds)), y = y / 100)

We can then fit a Beta regression model in rstanarm (note that this requires you to install package betareg).

fit <- stan_betareg(y ~ year + Expln, data = data)

We can now forecast; as mentioned in the beginning we simply forecast 9 years into the future, and copy Expln values from the previous 9 years. We then show posterior draws from the expectation of the posterior predictive.


data %>%
    mutate(year = map(year, `+`, c(0, 10))) %>%
    unnest(year) %>%
    arrange(year) %>%
    add_epred_draws(fit) %>%
    ggplot(aes(x = year, y = y)) + 
    stat_lineribbon(aes(y = .epred), alpha = 0.5) +
    geom_point(data = data) + 
    scale_y_continuous(labels = scales::percent, limits = c(0, 1)) + 
    scale_fill_brewer(palette = "Greys") + 
    theme_minimal()

enter image description here

A couple of observations:

  • You can see how forecasts never exceed 100%.
  • The kink in the (95% uncertainty band of the) forecasts in 2022 originates from the Expln value which has been copied from the 2012 value. This is (probably obviously) wrong.

As would have been the case with forecasting in prophet in the presence of external regressors, a two stage forecasting process is probably required:

  1. Forecast Expln values based on domain-level knowledge or an independent forecasting model Expln ~ year.
  2. Generate Expln for future years, then use those values and future year values to forecast y using this Beta regression model.

Upvotes: 1

Related Questions