Reputation: 687
I built a prophet model that will allow forecasting the annual coverage of 4G mobile network.
> data
ds y Expln
1 2012-08-13 59.0 1.00000
2 2013-08-13 59.0 21.79000
3 2014-08-13 59.0 59.10000
4 2015-08-13 65.0 71.00000
5 2016-08-13 65.0 79.26572
6 2017-08-13 67.0 85.57209
7 2018-08-13 79.0 89.87105
8 2019-08-13 82.0 92.98643
9 2020-08-13 97.8 95.98289
> pred_data
ds Expln
1 2012-08-13 1.00000
2 2013-08-13 21.79000
3 2014-08-13 59.10000
4 2015-08-13 71.00000
5 2016-08-13 79.26572
6 2017-08-13 85.57209
7 2018-08-13 89.87105
8 2019-08-13 92.98643
9 2020-08-13 95.98289
10 2021-08-13 97.51558
11 2022-08-13 98.36271
12 2023-08-13 98.85595
13 2024-08-13 99.14414
14 2025-08-13 100.00000
15 2026-08-13 100.00000
16 2027-08-13 100.00000
17 2028-08-13 100.00000
18 2029-08-13 100.00000
19 2030-08-13 100.00000
Obviously the coverage cannot exceed 100%. But when I build the following model:
make_model = function(df) {
df$cap <- 100
m <- prophet(growth = 'logistic')
m <- add_regressor(m, name = 'Expln')
m <- fit.prophet(m, df)
return(m)
}
make_frcst = function(m, df) {
future <- make_future_dataframe(m, 10, freq = 'year', include_history = TRUE)
future$Expln <- df$Expln
future$cap <- 100
fcst <- predict(m, future)
return(fcst)
}
> model = make_model(data)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
n.changepoints greater than number of observations. Using 6
> frcst = make_frcst(model, pred_data)
And plot the results, it gives this:
> plot(model, frcst)
>
Why did the forecast exceed the 100 mark although I made the cap
to be 100? The regressor also stops at 100. Can someone help me fix it?
Edit:
This is the dput()
of the data I used:
> dput(data)
structure(list(ds = structure(c(15565, 15930, 16295, 16660, 17026,
17391, 17756, 18121, 18487), class = "Date"), y = c(59, 59, 59,
65, 65, 67, 79, 82, 97.8), Expln = c(1, 21.79, 59.1, 71, 79.26572496,
85.57208641, 89.87104819, 92.98642948, 95.98289404)), class = "data.frame", row.names = c(NA,
-9L))
> dput(pred_data)
structure(list(ds = structure(c(15565, 15930, 16295, 16660, 17026,
17391, 17756, 18121, 18487, 18852, 19217, 19582, 19948, 20313,
20678, 21043, 21409, 21774, 22139), class = "Date"), Expln = c(1,
21.79, 59.1, 71, 79.26572496, 85.57208641, 89.87104819, 92.98642948,
95.98289404, 97.51557771, 98.36271286, 98.85595395, 99.14414047,
100, 100, 100, 100, 100, 100)), class = "data.frame", row.names = c(NA,
-19L))
Upvotes: 0
Views: 539
Reputation: 50678
On revisiting this post, here is what I would do:
rstanarm
to fit a Beta regression model of the form y ~ year + Expln
to the datayear
as well as Expln
. While choosing future values for year
is easy, you need to think harder about future values for Expln
. This would also be necessary when you do forecasting in prophet
with external regressors. In this example and for demonstration purposes, I simply duplicate the same set of Expln
values from the past 9 years.Let's start by loading necessary libraries
library(rstanarm)
library(lubridate)
library(tidyverse)
library(tidybayes)
We then create a new year
column, since your data seems to be yearly data. We also convert percentages to fractions [0, 1].
data <- data %>% mutate(year = year(as.Date(ds)), y = y / 100)
We can then fit a Beta regression model in rstanarm
(note that this requires you to install package betareg
).
fit <- stan_betareg(y ~ year + Expln, data = data)
We can now forecast; as mentioned in the beginning we simply forecast 9 years into the future, and copy Expln
values from the previous 9 years. We then show posterior draws from the expectation of the posterior predictive.
data %>%
mutate(year = map(year, `+`, c(0, 10))) %>%
unnest(year) %>%
arrange(year) %>%
add_epred_draws(fit) %>%
ggplot(aes(x = year, y = y)) +
stat_lineribbon(aes(y = .epred), alpha = 0.5) +
geom_point(data = data) +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
scale_fill_brewer(palette = "Greys") +
theme_minimal()
A couple of observations:
Expln
value which has been copied from the 2012 value. This is (probably obviously) wrong.As would have been the case with forecasting in prophet
in the presence of external regressors, a two stage forecasting process is probably required:
Expln
values based on domain-level knowledge or an independent forecasting model Expln ~ year
.Expln
for future years, then use those values and future year
values to forecast y
using this Beta regression model.Upvotes: 1