h-y-jp
h-y-jp

Reputation: 339

I would like to know "SARIMA(statsmodels) parameter" [order] and [seasonal_order]. how should I choose?

my data (time-seriese) contains two-years data.(a value / day, row=360*2)

Now, I try to use SARIMA model from statsmodels. I selected parameter(order and seasonal_order) randomly. order=(1,0,1), seasonal_order=(0,1,0,360) It was very fit to my data.

but I'm not understood in essence. how should I choose parameter(p, d, q)? order=(P,D,Q), seasonal_order=(p,d,q,s=360?) can I read it from the ACF or PACF fig? or AIC, BIC from summary?

(I tried to choose it from "least AIC model". but it didn't work well)


import statsmodels.api as sm
SARIMA_1_0_1_010 = sm.tsa.SARIMAX(t3, order=(1,0,1), seasonal_order=(0,1,0,300)).fit()
print(SARIMA_1_0_1_010.summary())

residSARIMA = SARIMA_1_0_1_010.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(residSARIMA.values.squeeze(), lags=100, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(residSARIMA, lags=100, ax=ax2)


pred = SARIMA_1_0_1_010.predict(700, 1200)
plt.figure(figsize=(22,10))
plt.plot(t3)
plt.plot(pred, "r")


and


max_p = 3
max_q = 3
max_d = 1
max_sp = 0
max_sq = 0
max_sd = 0

pattern = max_p*(max_q + 1)*(max_d + 1)*(max_sp + 1)*(max_sq + 1)*(max_sd + 1)
modelSelection = pd.DataFrame(index=range(pattern), columns=["model", "aic"])

season = 360

num = 0

for p in range(1, max_p + 1):
    for d in range(0, max_d + 1):
        for q in range(0, max_q + 1):
            for sp in range(0, max_sp + 1):
                for sd in range(0, max_sd + 1):
                    for sq in range(0, max_sq + 1):
                        sarima = sm.tsa.SARIMAX(
                            t3, order=(p,d,q), 
                            seasonal_order=(sp,sd,sq,360), 
                            enforce_stationarity = False, 
                            enforce_invertibility = False
                        ).fit()
                        modelSelection.ix[num]["model"] = "order=(" + str(p) + ","+ str(d) + ","+ str(q) + "), season=("+ str(sp) + ","+ str(sd) + "," + str(sq) + ")"
                        modelSelection.ix[num]["aic"] = sarima.aic
                        modelSelection.ix[num]["bic"] = sarima.bic

                        num = num + 1

modelSelection[modelSelection.aic == min(modelSelection.aic)]

It didn't predict well....

Upvotes: 1

Views: 3231

Answers (1)

cfulton
cfulton

Reputation: 3195

The basic problem here is that SARIMAX is not a very good model to use when the seasonal effect is very long (see e.g. https://stats.stackexchange.com/questions/117953/very-high-frequency-time-series-analysis-seconds-and-forecasting-python-r/118050#118050).

In general, choosing the order of the model (e.g. p, q and P, Q) via information criteria is a good idea, but it is not a good idea to choose the differencing order (d or D) that way.

A python package that can help automate model selection while using the SARIMAX model is https://github.com/tgsmith61591/pmdarima.

However, I should repeat that this will generally not work well for models with a season length of 360.

Upvotes: 3

Related Questions