Reputation: 53
I can't seem to understand if I'm mastering well enough the ARIMA model from statsmodels. It seems that, although I'm differentiating, and the series does need differentiation, the ARIMA model itself does not erase the N first values equivalent to the differentiation parameter. This is illustrated by the residuals and by the summary that still counts 24 observations even though it should (as far as I understand time series) erase the non-differentiated values.
Here's an example:
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
dates = pd.date_range(start='1970-01-01', periods=24, freq='YS')
time_series = [908000, 902000, 930000, 938000, 946000, 961000, 982000, 1002000,
1024000, 1006000, 1031000, 1047000, 1077000, 1103000, 1136000,
1170000, 1181000, 1210000, 1227000, 1264000,
1309000, 1312000, 1316000, 1349000]
df = pd.DataFrame(time_series, index=dates, columns=['Series'])
model = ARIMA(df['Series'], order=(1, 1, 1))
result_model = model.fit()
residuals = result_model.resid
summary = result_model.summary()
residuals.plot(title='Residuals from ARIMA(1,1,1)')
plt.show()
print(summary)
print(residuals)
Output goes with warnings that seems to be from the same problem. It happens on "model.fit()":
UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
warn('Non-stationary starting autoregressive parameters'
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "
And then:
SARIMAX Results
==============================================================================
Dep. Variable: Series No. Observations: 24
Model: ARIMA(1, 1, 1) Log Likelihood -253.629
Date: Thu, 18 Apr 2024 AIC 513.259
Time: 03:32:15 BIC 516.665
Sample: 01-01-1970 HQIC 514.116
- 01-01-1993
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.9999 0.021 48.639 0.000 0.960 1.040
ma.L1 -0.9983 0.258 -3.876 0.000 -1.503 -0.493
sigma2 2.799e+08 4.39e-10 6.38e+17 0.000 2.8e+08 2.8e+08
===================================================================================
Ljung-Box (L1) (Q): 0.14 Jarque-Bera (JB): 1.59
Prob(Q): 0.71 Prob(JB): 0.45
Heteroskedasticity (H): 1.95 Skew: -0.64
Prob(H) (two-sided): 0.37 Kurtosis: 3.04
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.17e+34. Standard errors may be unstable.
It's easy to erase, from residuals, the N number of elements. I don't understand how I could do similarly on model.fit(), or if there's some parameter in model.fit() I have not understood to fill.
model_diff_parameter = 1
residuals= residuals[model_diff_parameter:]
Upvotes: 0
Views: 86