enriicoo
enriicoo

Reputation: 53

Why it seems statsmodels ARIMA doesn't discard values on differentiation?

I can't seem to understand if I'm mastering well enough the ARIMA model from statsmodels. It seems that, although I'm differentiating, and the series does need differentiation, the ARIMA model itself does not erase the N first values equivalent to the differentiation parameter. This is illustrated by the residuals and by the summary that still counts 24 observations even though it should (as far as I understand time series) erase the non-differentiated values.

Here's an example:

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

dates = pd.date_range(start='1970-01-01', periods=24, freq='YS')
time_series = [908000, 902000, 930000, 938000, 946000, 961000, 982000, 1002000,
               1024000, 1006000, 1031000, 1047000, 1077000, 1103000, 1136000,
               1170000, 1181000, 1210000, 1227000, 1264000,
               1309000, 1312000, 1316000, 1349000]

df = pd.DataFrame(time_series, index=dates, columns=['Series'])

model = ARIMA(df['Series'], order=(1, 1, 1))
result_model = model.fit()
residuals = result_model.resid

summary = result_model.summary()
residuals.plot(title='Residuals from ARIMA(1,1,1)')
plt.show()
print(summary)
print(residuals)

Output goes with warnings that seems to be from the same problem. It happens on "model.fit()":

UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

And then:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                 Series   No. Observations:                   24
Model:                 ARIMA(1, 1, 1)   Log Likelihood                -253.629
Date:                Thu, 18 Apr 2024   AIC                            513.259
Time:                        03:32:15   BIC                            516.665
Sample:                    01-01-1970   HQIC                           514.116
                         - 01-01-1993                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.9999      0.021     48.639      0.000       0.960       1.040
ma.L1         -0.9983      0.258     -3.876      0.000      -1.503      -0.493
sigma2      2.799e+08   4.39e-10   6.38e+17      0.000     2.8e+08     2.8e+08
===================================================================================
Ljung-Box (L1) (Q):                   0.14   Jarque-Bera (JB):                 1.59
Prob(Q):                              0.71   Prob(JB):                         0.45
Heteroskedasticity (H):               1.95   Skew:                            -0.64
Prob(H) (two-sided):                  0.37   Kurtosis:                         3.04
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.17e+34. Standard errors may be unstable.

Residuals from the series before

It's easy to erase, from residuals, the N number of elements. I don't understand how I could do similarly on model.fit(), or if there's some parameter in model.fit() I have not understood to fill.

model_diff_parameter = 1
residuals= residuals[model_diff_parameter:]

Upvotes: 0

Views: 86

Answers (0)

Related Questions