Reputation: 378
I have created a function which allows me to carry out time series forecasting using the fable
package. The idea of the function was to analyse observed vs predicted values after a particular date/event. Here is a mock data frame which generates a column of dates:-
set.seed(1)
df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))
And here is the function I created. You specify the data, then the date of the event, then the forecast period in days and lastly a title for your graph.
event_analysis<-function(data,eventdate,period,title){
require(dplyr)
require(tsibble)
require(fable)
require(fabletools)
require(imputeTS)
require(ggplot2)
data_count<-data%>%
group_by(Date)%>%
summarise(Count=n())
data_count<-as_tsibble(data_count)
data_count<-na_mean(data_count)
train <- data_count %>%
#sample_frac(0.8)
filter(Date<=as.Date(eventdate))
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count)
) %>%
mutate(mixed = (ets + arima + snaive) / 3)
fc <- fit %>% forecast(h = period)
forecastplot<-fc %>%
autoplot(data_count, level = NULL)+ggtitle(title)+
geom_vline(xintercept = as.Date(eventdate),linetype="dashed",color="red")+
labs(caption = "Red dashed line = Event occurrence")
fc_accuracy<-accuracy(fc,data_count)
#obs<-data_count
#colnames(obs)[2]<-"Observed"
#obs_pred<-merge(data_count,fc_accuracy, by="Date")
return(list(forecastplot,fc_accuracy,fc))
}
And in one run, I specify the df
, the date of the event, the number of days that I want to forecast (3 weeks), then the title:-
event_analysis(df, "2020-01-01",21,"Event forecast")
Which will print this outcome and plot:-
I concede that the mock data I made isn't totally ideal but the function works well on my real-world data.
Here is what I want to achieve. I would like this output that has been made to come out of the function, but in addition, I would like an additional graph which "zooms in" on the period that has been forecasted, for 2 reasons:-
So, an additional graph (along with the original full forecast) that would look like this, perhaps in the one output, "multiplot" style:-
The other thing would be to print another output which shows the observed values in the test set against the predicted values from the models used in the forecasting.
These are basically the two additional things I want to add to my function but I am not sure how to go about this. Any help is massively appreciated :) .
Upvotes: 0
Views: 822
Reputation: 7858
I suppose you could rewrite it this way. I made a couple of adjustments to help you out.
set.seed(1)
df <- data.frame(Date = sort(sample(seq(as.Date('2018/01/01'), as.Date('2020/09/17'), by="day"),1368883, replace = T)))
event_analysis <- function(data, eventdate, period, title){
# in the future, you may think to move them out
library(dplyr)
library(tsibble)
library(fable)
library(fabletools)
library(imputeTS)
library(ggplot2)
# convert at the beginning
eventdate <- as.Date(eventdate)
# more compact sintax
data_count <- count(data, Date, name = "Count")
# better specify the date variable to avoid the message
data_count <- as_tsibble(data_count, index = Date)
# you need to complete missing dates, just in case
data_count <- tsibble::fill_gaps(data_count)
data_count <- na_mean(data_count)
train <- data_count %>%
filter(Date <= eventdate)
test <- data_count %>%
filter(Date > eventdate, Date <= (eventdate+period))
fit <- train %>%
model(
ets = ETS(Count),
arima = ARIMA(Count),
snaive = SNAIVE(Count)
) %>%
mutate(mixed = (ets + arima + snaive) / 3)
fc <- fit %>% forecast(h = period)
# your plot
forecastplot <- fc %>%
autoplot(data_count, level = NULL) +
ggtitle(title) +
geom_vline(xintercept = as.Date(eventdate), linetype = "dashed", color = "red") +
labs(caption = "Red dashed line = Event occurrence")
# plot just forecast and test
zoomfcstplot <- autoplot(fc) + autolayer(test, .vars = Count)
fc_accuracy <- accuracy(fc,data_count)
### EDIT: ###
# results vs test
res <- fc %>%
as_tibble() %>%
select(-Count) %>%
tidyr::pivot_wider(names_from = .model, values_from = .mean) %>%
inner_join(test, by = "Date")
##############
return(list(forecastplot = forecastplot,
zoomplot = zoomfcstplot,
accuracy = fc_accuracy,
forecast = fc,
results = res))
}
event_analysis(df,
eventdate = "2020-01-01",
period = 21,
title = "Event forecast")
Output:
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Carico il pacchetto richiesto: fabletools
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
#> $forecastplot
#>
#> $zoomplot
#>
#> $accuracy
#> # A tibble: 4 x 9
#> .model .type ME RMSE MAE MPE MAPE MASE ACF1
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 arima Test -16.8 41.8 35.2 -1.31 2.61 0.791 0.164
#> 2 ets Test -16.8 41.8 35.2 -1.31 2.61 0.791 0.164
#> 3 mixed Test -21.9 44.7 36.8 -1.68 2.73 0.825 -0.0682
#> 4 snaive Test -32.1 57.3 46.6 -2.43 3.45 1.05 -0.377
#>
#> $forecast
#> # A fable: 84 x 4 [1D]
#> # Key: .model [4]
#> .model Date Count .mean
#> <chr> <date> <dist> <dbl>
#> 1 ets 2020-01-02 N(1383, 1505) 1383.
#> 2 ets 2020-01-03 N(1383, 1505) 1383.
#> 3 ets 2020-01-04 N(1383, 1505) 1383.
#> 4 ets 2020-01-05 N(1383, 1505) 1383.
#> 5 ets 2020-01-06 N(1383, 1505) 1383.
#> 6 ets 2020-01-07 N(1383, 1505) 1383.
#> 7 ets 2020-01-08 N(1383, 1505) 1383.
#> 8 ets 2020-01-09 N(1383, 1505) 1383.
#> 9 ets 2020-01-10 N(1383, 1505) 1383.
#> 10 ets 2020-01-11 N(1383, 1505) 1383.
#> # ... with 74 more rows
#>
#> $results
#> # A tibble: 21 x 6
#> Date ets arima snaive mixed Count
#> <date> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 2020-01-02 1383. 1383. 1386 1384. 1350
#> 2 2020-01-03 1383. 1383. 1366 1377. 1398
#> 3 2020-01-04 1383. 1383. 1426 1397. 1357
#> 4 2020-01-05 1383. 1383. 1398 1388. 1415
#> 5 2020-01-06 1383. 1383. 1431 1399. 1399
#> 6 2020-01-07 1383. 1383. 1431 1399. 1346
#> 7 2020-01-08 1383. 1383. 1350 1372. 1299
#> 8 2020-01-09 1383. 1383. 1386 1384. 1303
#> 9 2020-01-10 1383. 1383. 1366 1377. 1365
#> 10 2020-01-11 1383. 1383. 1426 1397. 1328
#> # ... with 11 more rows
Upvotes: 1