Predictions with ARIMA (python statsmodels)

Question

I have some time series data which contains some seasonal trends and I want to use an ARIMA model to predict how this series will behave in the future.

In order to predict how my variable of interest (log_var) will behave I have taken a weekly, monthly and annual difference and then used these as the input to an ARIMA model.

Below is an example.

exog = np.column_stack([df_arima['log_var_diff_wk'], 
                        df_arima['log_var_diff_mth'], 
                        df_arima['log_var_diff_yr']]) 

model = ARIMA(df_arima['log_var'], exog = exog, order=(1,0,1)) 
results_ARIMA = model.fit()

I am doing this for several different data sources and in all of them I see great results, in the sense that if I plot log_var against results_ARIMA.fittedvalues for the training data then it matches very well (I tune p and q for each data source separately, but d is always 0 given that I have already taken the difference myself).

However, I then want to check what the predictions look like, and in order to do this I redfine exog to just be the 'test' dataset. For example, if I train the original ARIMA model on 2014-01-01 to 2016-01-01, the 'test' set would just be 2016-01-01 onwards.

My approach has worked well for some data sources (in the sense that I plot the forecast against the known values and the trends look sensible) but badly for others, although they are all the same 'kind' of data and they have just been taken from different geographical locations. In some of the locations it completely fails to catch obvious seasonal trends that occur again and again in the training data on the same dates each year. The ARIMA model always fits the training data well, it just seems that in some cases the predictions are completely useless.

I am now wondering if I am actually following the correct procedure to predict values from the ARIMA model. My approach is basically:

exog = np.column_stack([df_arima_predict['log_val_diff_wk'], 
                        df_arima_predict['log_val_diff_mth'], 
                        df_arima_predict['log_val_diff_yr']])

arima_predict = results_ARIMA.predict(start=training_cut_date, end = '2017-01-01', dynamic = False, exog = exog)

Is this the correct way to go about making predictions with ARIMA?

If so, is there a way I can try to understand why the predictions look very good in some datasets and terrible in others, when the ARIMA model seems to fit the training data just as well in both cases?

Predictions with ARIMA (python statsmodels)

Answers (1)

Related Questions