Reputation: 339
my data (time-seriese) contains two-years data.(a value / day, row=360*2)
Now, I try to use SARIMA model from statsmodels. I selected parameter(order and seasonal_order) randomly. order=(1,0,1), seasonal_order=(0,1,0,360) It was very fit to my data.
but I'm not understood in essence. how should I choose parameter(p, d, q)? order=(P,D,Q), seasonal_order=(p,d,q,s=360?) can I read it from the ACF or PACF fig? or AIC, BIC from summary?
(I tried to choose it from "least AIC model". but it didn't work well)
import statsmodels.api as sm
SARIMA_1_0_1_010 = sm.tsa.SARIMAX(t3, order=(1,0,1), seasonal_order=(0,1,0,300)).fit()
print(SARIMA_1_0_1_010.summary())
residSARIMA = SARIMA_1_0_1_010.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(residSARIMA.values.squeeze(), lags=100, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(residSARIMA, lags=100, ax=ax2)
pred = SARIMA_1_0_1_010.predict(700, 1200)
plt.figure(figsize=(22,10))
plt.plot(t3)
plt.plot(pred, "r")
max_p = 3
max_q = 3
max_d = 1
max_sp = 0
max_sq = 0
max_sd = 0
pattern = max_p*(max_q + 1)*(max_d + 1)*(max_sp + 1)*(max_sq + 1)*(max_sd + 1)
modelSelection = pd.DataFrame(index=range(pattern), columns=["model", "aic"])
season = 360
num = 0
for p in range(1, max_p + 1):
for d in range(0, max_d + 1):
for q in range(0, max_q + 1):
for sp in range(0, max_sp + 1):
for sd in range(0, max_sd + 1):
for sq in range(0, max_sq + 1):
sarima = sm.tsa.SARIMAX(
t3, order=(p,d,q),
seasonal_order=(sp,sd,sq,360),
enforce_stationarity = False,
enforce_invertibility = False
).fit()
modelSelection.ix[num]["model"] = "order=(" + str(p) + ","+ str(d) + ","+ str(q) + "), season=("+ str(sp) + ","+ str(sd) + "," + str(sq) + ")"
modelSelection.ix[num]["aic"] = sarima.aic
modelSelection.ix[num]["bic"] = sarima.bic
num = num + 1
modelSelection[modelSelection.aic == min(modelSelection.aic)]
It didn't predict well....
Upvotes: 1
Views: 3231
Reputation: 3195
The basic problem here is that SARIMAX
is not a very good model to use when the seasonal effect is very long (see e.g. https://stats.stackexchange.com/questions/117953/very-high-frequency-time-series-analysis-seconds-and-forecasting-python-r/118050#118050).
In general, choosing the order of the model (e.g. p, q and P, Q) via information criteria is a good idea, but it is not a good idea to choose the differencing order (d or D) that way.
A python package that can help automate model selection while using the SARIMAX
model is https://github.com/tgsmith61591/pmdarima.
However, I should repeat that this will generally not work well for models with a season length of 360.
Upvotes: 3