Bogdanovist
Bogdanovist

Reputation: 1546

ARIMA out of sample prediction in statsmodels?

I have a timeseries forecasting problem that I am using the statsmodels python package to address. Evaluating using the AIC criteria, the optimal model turns out to be quite complex, something like ARIMA(27,1,8) [ I haven't done an exhaustive search of the parameter space, but it seems to be at a minima around there]. I am having real trouble validating and forecasting with this model though, because it takes a very long time (hours) to train a single model instance, so doing repeated tests is very difficult.

In any case, what I really need as a minimum in order to be able to use statsmodels in operations (assuming I can get the model validated somehow first) is an mechanism for incorporating new data as it arrives in order to make the next set of forecasts. I would like to be able to fit a model on the available data, pickle it, and then unpickle later when the next datapoint is available and incorporate that into an updated set of forecasts. At the moment I have to re-fit the model each time new data becomes available, which as I said takes a very long time.

I had a look at this question which address essentially the problem I have but for ARMA models. For the ARIMA case however there is the added complexity of the data being differenced. I need to be able to produce new forecasts of the original timeseries (c.f. typ='levels' keyword in the ARIMAResultsWrapper.predict method). It's my understanding that statsmodels cannot do this at present, but what components of the existing functionality would I need to use in order to write something to do this myself?

Edit: I am also using transparams=True, so the prediction process needs to be able to transform the predictions back into the original timeseries, which is an additional difficulty in a homebrew approach.

Upvotes: 3

Views: 2316

Answers (1)

Nathan Gould
Nathan Gould

Reputation: 8225

An ARIMA(27,1,8) model is extremely complex, in the scheme of things. For most time series, you can do reasonable prediction with five or so parameters. Of course it depends on the data and domain, but I'm very skeptical that 27 + 8 = 35 parameters are necessary.

The AIC is occasionally known to be too permissive with number of parameters. I'd try comparing results with BIC.

I'd also look into whether your data has seasonality of some kind. E.g., maybe all 27 of those AR terms don't matter, and you really just need lag=1, and lag=24 (for instance). That might be the case for hourly data that has daily seasonality.

Upvotes: 0

Related Questions