Reputation: 2520
I am trying to cross-validate a Prophet model in R. The problem - this package does not work well with monthly data.
I managed to build the model
and even used a custom monthly seasonality.
as recommended by authors of this tool.
But cannot cross-validate monthly data. Tried to follow recommendations in the GitHub issue, but missing something.
Currently my code looks like this
model1_cv <- cross_validation(model1, initial = 156, period = 365/12, as.difftime(horizon = 365/12, units = "days"))
Updated:
Based on answer to this question, I visualized CV results. There some problems here. I used full data and partial data.
Also metrics do not look that good
Upvotes: 0
Views: 1379
Reputation: 4344
I just tested a bit with training data from the package and from what I understood the package is not really well suited for monthly forecast, this part: [...] as.difftime(365/12, units = "days") [...] seems to have been informed just to prove the size of the month with 30something days. Meaning you can use this instead of just 365/12 por "period" and/or "horizon". One thing I noticed is, that both arguments are of type integer per description but when you look into the function they are calculated per as.datediff() so they are doubles actually.
library(dplyr)
library(prophet)
library(data.table)
#training data
df <- data.table::fread("ds y
1992-01-01 146376
1992-02-01 147079
1992-03-01 159336
1992-04-01 163669
1992-05-01 170068
1992-06-01 168663
1992-07-01 169890
1992-08-01 170364
1992-09-01 164617
1992-10-01 173655
1992-11-01 171547
1992-12-01 208838
1993-01-01 153221
1993-02-01 150087
1993-03-01 170439
1993-04-01 176456
1993-05-01 182231
1993-06-01 181535
1993-07-01 183682
1993-08-01 183318
1993-09-01 177406
1993-10-01 182737
1993-11-01 187443
1993-12-01 224540
1994-01-01 161349
1994-02-01 162841
1994-03-01 192319
1994-04-01 189569
1994-05-01 194927
1994-06-01 197946
1994-07-01 193355
1994-08-01 202388
1994-09-01 193954
1994-10-01 197956
1994-11-01 202520
1994-12-01 241111
1995-01-01 175344
1995-02-01 172138
1995-03-01 201279
1995-04-01 196039
1995-05-01 210478
1995-06-01 211844
1995-07-01 203411
1995-08-01 214248
1995-09-01 202122
1995-10-01 204044
1995-11-01 212190
1995-12-01 247491
1996-01-01 185019
1996-02-01 192380
1996-03-01 212110
1996-04-01 211718
1996-05-01 226936
1996-06-01 217511
1996-07-01 218111")
df <- df %>%
dplyr::mutate(ds = as.Date(ds))
model <- prophet::prophet(df)
(tscv.myfit <- prophet::cross_validation(model, horizon = 365/12, units = "days", period = 365/12, initial = 365/12 * 12 * 3))
y ds yhat yhat_lower yhat_upper cutoff
1: 175344 1995-01-01 170988.8 170145.9 171828.0 1994-12-31 02:00:00
2: 172138 1995-02-01 178117.4 176975.2 179070.2 1995-01-30 12:00:00
3: 201279 1995-03-01 211462.8 210277.4 212670.8 1995-01-30 12:00:00
4: 196039 1995-04-01 200113.9 198079.5 201977.8 1995-03-01 22:00:00
5: 210478 1995-05-01 202100.5 200390.8 203797.9 1995-04-01 08:00:00
6: 211844 1995-06-01 208330.5 206229.9 210497.4 1995-05-01 18:00:00
7: 203411 1995-07-01 202563.8 200786.5 204313.0 1995-06-01 04:00:00
8: 214248 1995-08-01 214639.6 212748.3 216461.3 1995-07-01 14:00:00
9: 202122 1995-09-01 204954.0 203048.9 206768.4 1995-08-31 12:00:00
10: 204044 1995-10-01 205097.5 203209.7 206882.3 1995-09-30 22:00:00
11: 212190 1995-11-01 213586.7 211728.1 215617.6 1995-10-31 08:00:00
12: 247491 1995-12-01 251518.8 249708.2 253589.2 1995-11-30 18:00:00
13: 185019 1996-01-01 182403.7 180520.1 184494.7 1995-12-31 04:00:00
14: 192380 1996-02-01 184722.9 182772.7 186686.9 1996-01-30 14:00:00
15: 212110 1996-03-01 205020.1 202823.2 206996.9 1996-01-30 14:00:00
16: 211718 1996-04-01 214514.0 211891.9 217175.3 1996-03-31 14:00:00
17: 226936 1996-05-01 218845.2 216133.8 221420.4 1996-03-31 14:00:00
18: 217511 1996-06-01 218672.2 216007.8 221459.9 1996-05-31 14:00:00
19: 218111 1996-07-01 221156.1 218540.7 224184.1 1996-05-31 14:00:00
The cutoff is not as regular as one would expect - I guess this is due to using average days per month somehow - though I could not figute out the logic. You can replace 365/12 with as.difftime(365/12, units = "days") and will get the same result.
But if you use (365+365+365+366) / 48 instead due to the 29.02. you get a slighly different average days per month and this leads to a different output:
(tscv.myfit_2 <- prophet::cross_validation(model, horizon = (365+365+365+366)/48, units = "days", period = (365+365+365+366)/48, initial = (365+365+365+366)/48 * 12 * 3))
y ds yhat yhat_lower yhat_upper cutoff
1: 172138 1995-02-01 178117.4 177075.3 179203.9 1995-01-29 13:30:00
2: 201279 1995-03-01 211462.8 210340.5 212607.3 1995-01-29 13:30:00
3: 196039 1995-04-01 200113.9 198022.6 202068.1 1995-03-31 13:30:00
4: 210478 1995-05-01 204100.2 202009.8 206098.7 1995-03-31 13:30:00
5: 211844 1995-06-01 208330.5 206114.5 210515.8 1995-05-31 13:30:00
6: 203411 1995-07-01 202606.0 200319.1 204663.4 1995-05-31 13:30:00
7: 214248 1995-08-01 214639.6 212684.4 216495.7 1995-07-31 22:30:00
8: 202122 1995-09-01 204954.0 203127.7 206951.0 1995-08-31 09:00:00
9: 204044 1995-10-01 205097.5 203285.3 207036.5 1995-09-30 19:30:00
10: 212190 1995-11-01 213586.7 211516.8 215516.2 1995-10-31 06:00:00
11: 247491 1995-12-01 251518.8 249658.3 253590.1 1995-11-30 16:30:00
12: 185019 1996-01-01 182403.7 180359.7 184399.2 1995-12-31 03:00:00
13: 192380 1996-02-01 184722.9 182652.4 186899.8 1996-01-30 13:30:00
14: 212110 1996-03-01 205020.1 203040.3 207171.9 1996-01-30 13:30:00
15: 211718 1996-04-01 214514.0 211942.6 217252.6 1996-03-31 13:30:00
16: 226936 1996-05-01 218845.2 216203.1 221506.5 1996-03-31 13:30:00
17: 217511 1996-06-01 218672.2 215823.9 221292.4 1996-05-31 13:30:00
18: 218111 1996-07-01 221156.1 218236.7 223862.0 1996-05-31 13:30:00
Form this behaviour I would say the work arround is not ideal, especially depending how exact you want the crossvalidation to be in terms of rolling month. If you need the cutoff points to be exact you could write your own function and predict always one month from the starting point, collect these results and build final comparision. I would trust this approach more than the work arround.
Upvotes: 1