Reputation: 1
I'm hoping you might be able to help me with an issue that I'm having when trying to fit an ARIMA model for a school project that I`m working on.
The data that I'm using shows weekly sales figures starting from 2019 and going till 2021. My goal is to produce a forecast for the remainder of 2021 based on those figures. As my dataset is comprised of weekly data and the seasonality based on the ACF and PACF plots seems to occur once a year I've set the "S =" argument from the sarima() function to 52. The problem is that every time I try to run the model, I keep getting an error and I can't figure out any way of getting rid of it.
I've tried to use the same code with other data sets on the datacamp environment with "S = 52" and the model runs without a problem. I'm hoping that somebody might be able to give me some advice on how to deal with this issue. Thank you!
P.S.
If the "S =" argument is set lower than 35 then the model will run. (Just in case this information might help)
####Install Packages####
library(tidyverse)
library(zoo)
library(xts)
library(lubridate)
library(astsa)
library(tseries)
library(forecast)
######Load and inspect the data########
unit_sales <- structure(list(Date = c("30/03/2019", "06/04/2019", "13/04/2019",
"20/04/2019", "27/04/2019", "04/05/2019", "11/05/2019", "18/05/2019",
"25/05/2019", "01/06/2019", "08/06/2019", "15/06/2019", "22/06/2019",
"29/06/2019", "06/07/2019", "13/07/2019", "20/07/2019", "27/07/2019",
"03/08/2019", "10/08/2019", "17/08/2019", "24/08/2019", "31/08/2019",
"07/09/2019", "14/09/2019", "21/09/2019", "28/09/2019", "05/10/2019",
"12/10/2019", "19/10/2019", "26/10/2019", "02/11/2019", "09/11/2019",
"16/11/2019", "23/11/2019", "30/11/2019", "07/12/2019", "14/12/2019",
"21/12/2019", "28/12/2019", "04/01/2020", "11/01/2020", "18/01/2020",
"25/01/2020", "01/02/2020", "08/02/2020", "15/02/2020", "22/02/2020",
"29/02/2020", "07/03/2020", "14/03/2020", "21/03/2020", "28/03/2020",
"04/04/2020", "11/04/2020", "18/04/2020", "25/04/2020", "02/05/2020",
"09/05/2020", "16/05/2020", "23/05/2020", "30/05/2020", "06/06/2020",
"13/06/2020", "20/06/2020", "27/06/2020", "04/07/2020", "11/07/2020",
"18/07/2020", "25/07/2020", "01/08/2020", "08/08/2020", "15/08/2020",
"22/08/2020", "29/08/2020", "05/09/2020", "12/09/2020", "19/09/2020",
"26/09/2020", "03/10/2020", "10/10/2020", "17/10/2020", "24/10/2020",
"31/10/2020", "07/11/2020", "14/11/2020", "21/11/2020", "28/11/2020",
"05/12/2020", "12/12/2020", "19/12/2020", "26/12/2020", "02/01/2021",
"09/01/2021", "16/01/2021", "23/01/2021", "30/01/2021", "06/02/2021",
"13/02/2021", "20/02/2021", "27/02/2021", "06/03/2021", "13/03/2021",
"20/03/2021", "27/03/2021"), Units = c(967053.4, 633226.9, 523264,
473914.2, 418087.5, 504342.2, 477819, 415650, 406972.3, 429791.4,
441724.4, 453221.8, 402005.8, 414993.4, 381457.2, 391218.7, 486925.9,
409791.8, 399217.9, 409210, 478121.2, 495549.1, 503918.3, 535949.5,
517450.4, 523036.8, 616456.9, 665979.3, 705201.5, 700168.1, 763538.8,
875501.2, 886586.6, 967806, 1094195, 1285950.5, 1450436.1, 1592162.8,
2038160.5, 1676988.8, 1026193.7, 820405.5, 738643.9, 669657.6,
720287.7, 673194.1, 754102.5, 639532, 680413.6, 710702, 711722.8,
834036.8, 427817.2, 505849.6, 441047.4, 439411, 487634.1, 594594.8,
548796.7, 565682, 528275.2, 448092, 467780.1, 544160.3, 538275.8,
485055.5, 592097.3, 537514.3, 493381.9, 445280.8, 448111.2, 419263.4,
457125.7, 561169.6, 704575.3, 656423.1, 653751.3, 622937.7, 718022.8,
768901.9, 793443, 814604.2, 876269.3, 982921.8, 1064920.7, 1201494.4,
1337374.9, 1619595.8, 1734773.8, 1624071, 1777832.3, 1648201.9,
1106253.8, 940141.1, 796129.1, 853392.9, 932059.1, 905990.4,
981188.6, 907823.9, 956098.8, 1003966.7, 1331125.5, 805593.6,
799486.2)), class = "data.frame", row.names = c(NA, -105L))
####Convert date column to date format
unit_sales$Date <- as.Date(unit_sales$Date, format ="%d/%m/%Y" )
###Convert to xts object
unit_sales_xts <- xts(unit_sales, unit_sales$Date)
periodicity(unit_sales_xts)
###Convert to ts object
unit_sales_vector <- unit_sales$Units
unit_sales_ts <- ts(unit_sales_vector, start = decimal_date(as.Date("2019-03-30")), frequency = 52)
###Plot data
ts.plot(unit_sales_ts)
###Make data stationary and plot it
ts.plot(diff(log(unit_sales_ts)))
###Plot ACF and PACF
pacf_plot <- pacf(diff(log(unit_sales_ts)), lag.max = 105)
acf_plot <- acf(diff(log(unit_sales_ts)), lag.max = 105)
###Test if data is stationary
adf.test(diff(log(unit_sales_ts)))
###Fit ARIMA model
sarima(unit_sales_ts, p = 1, d = 1, q = 0)
sarima.for(unit_sales_ts, n.ahead = 39, 1,1,0)
**###Fit Seasona ARIMA model - THIS IS WHERE THE ERROR OCCURS -**
sarima(unit_sales_ts, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
###Forecast using the above model
sarima.for(unit_sales_ts,n.ahead = 39, p = 1, d = 1, q = 0, P = 0, D = 1, Q = 0, S = 52)
Upvotes: 0
Views: 78
Reputation: 4344
I tested you code and get the same error, so I read into the astsa::sarima() implementation and found these two lines, concerning the use of seasonality and your data:
alag <- max(10 + sqrt(num), 3 * S)
nlag <- ifelse(S < 7, 20, 3 * S)
Without reading the whole implementation, I deduce, that the package creator suposes 3 times the season size for the parameter to work correctly. Which is not your case with 105 observation when using S = 52. Now if that is a bug or just not well documented or properly treated in the code, I can not tell you. I do not know which version of the package datacamp runs and what is the update history of the package itself. But we can assume that at least one of the two lines causes the error since all values from 35 for S cause the same error.
One way to work arround is printing the implementation code of the function to console (just write "astsa::sarima" and hit enter, without the " though), copy it to modify the lines (I tried to use 2 * instead of 3 *) and assing it to a function name of your own. Then the code runs. Also you could try the print at the datacamp environment and compare to you local installation.
Upvotes: 1