Manasa Tallam
Manasa Tallam

Reputation: 33

How to automate SARIMA model for time series forecasting?

I am trying to find the right parameters for p,d,q in time series forecasting using SARIMA. I need to forecast house prices for 1000 zip codes. The problem is that grid search takes too much time and I can't manually look at ACF/PACF for each zip code as I need to automate it.

I tried using grid search for 8 different combinations of parameters and used the best set of params based on AIC.

p = d = q = range(0, 2)
#d = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
parameters = []
for param in pdq:
    for param_seasonal in seasonal_pdq:
        try:
            model = sm.tsa.statespace.SARIMAX(y_new,method='css',
                                            order=param,
                                            seasonal_order=param_seasonal,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)
            results = model.fit()
            #print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))
        except:
            continue
        aic = results.aic
        parameters.append([param,param_seasonal,aic])
result_table = pd.DataFrame(parameters)
result_table.columns = ['parameters','parameters_seasonal','aic']
    # sorting in ascending order, the lower AIC is - the better
result_table = result_table.sort_values(by='aic', ascending=True).reset_index(drop=True)

I can't get a model which can beat the naive forecast. Can you give me some direction on how to proceed?

Upvotes: 2

Views: 7274

Answers (1)

Michael Grogan
Michael Grogan

Reputation: 1026

Your best bet is to use the pyramid library, which would automate the selection of p, d, q parameters. You would need to manipulate the data sufficiently so as to feed in 1000 time series, but here is an example of how it would be run on a single time series.

Suppose we have a dataset of maximum recorded daily temperature over time, and the objective is to automate the selection of p, d, q parameters for ARIMA. This could be achieved as follows:

from pyramid.arima.stationarity import ADFTest
adf_test = ADFTest(alpha=0.05)
adf_test.is_stationary(series)
train, test = series[1:741], series[742:927]
train.shape
test.shape
plt.plot(train)
plt.plot(test)
plt.title("Training and Test Data")
plt.show()

training and test

As you can see, the ARIMA model selection itself is based on the configuration with the lowest AIC in this case:

>>> Arima_model=auto_arima(train, start_p=1, start_q=1, max_p=8, max_q=8, start_P=0, start_Q=0, max_P=8, max_Q=8, m=12, seasonal=True, trace=True, d=1, D=1, error_action='warn', suppress_warnings=True, random_state = 20, n_fits=30)
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=-667.202, BIC=-648.847, Fit time=3.710 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=-270.700, BIC=-261.522, Fit time=0.354 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=-625.446, BIC=-607.090, Fit time=2.365 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1090.370, BIC=-1072.014, Fit time=7.584 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=-1088.657, BIC=-1065.712, Fit time=10.024 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 0, 12); AIC=-653.939, BIC=-640.172, Fit time=1.733 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=-1087.889, BIC=-1064.944, Fit time=25.853 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=-1087.188, BIC=-1059.655, Fit time=31.205 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1105.233, BIC=-1082.288, Fit time=10.266 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 1, 1, 12); AIC=-887.349, BIC=-868.994, Fit time=9.558 seconds
Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=-1086.931, BIC=-1059.397, Fit time=11.649 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 1, 12); AIC=-724.814, BIC=-711.047, Fit time=4.372 seconds
Fit ARIMA: order=(2, 1, 2) seasonal_order=(0, 1, 1, 12); AIC=-1085.480, BIC=-1053.358, Fit time=17.619 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 1, 12); AIC=-1072.933, BIC=-1045.400, Fit time=13.924 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 2, 12); AIC=-1102.926, BIC=-1075.392, Fit time=28.082 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 2, 12); AIC=-1102.342, BIC=-1070.219, Fit time=35.426 seconds
Fit ARIMA: order=(2, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=-1010.837, BIC=-983.303, Fit time=8.926 seconds
Total fit time: 222.656 seconds
>>> 
>>> Arima_model.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                                 Statespace Model Results                                 
==========================================================================================
Dep. Variable:                                  y   No. Observations:                  740
Model:             SARIMAX(1, 1, 1)x(0, 1, 1, 12)   Log Likelihood                 557.617
Date:                            Thu, 14 Mar 2019   AIC                          -1105.233
Time:                                    16:33:59   BIC                          -1082.288
Sample:                                         0   HQIC                         -1096.379
                                            - 740                                         
Covariance Type:                              opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept   1.359e-06   6.75e-06      0.201      0.840   -1.19e-05    1.46e-05
ar.L1          0.1558      0.034      4.575      0.000       0.089       0.223
ma.L1         -0.9847      0.013    -75.250      0.000      -1.010      -0.959
ma.S.L12      -0.9933      0.092    -10.837      0.000      -1.173      -0.814
sigma2         0.0118      0.001     11.259      0.000       0.010       0.014
===================================================================================
Ljung-Box (Q):                       54.38   Jarque-Bera (JB):              3179.66
Prob(Q):                              0.06   Prob(JB):                         0.00
Heteroskedasticity (H):               0.77   Skew:                            -1.46
Prob(H) (two-sided):                  0.04   Kurtosis:                        12.82
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

If you are familiar with R, you can also use the auto.arima command. In fact, I would recommend doing so, as there are occasions where it might give you a better automated configuration than in Pyramid (which was developed more recently).

That said, pyramid would help to automate things greatly for you.

Upvotes: 1

Related Questions