Zophai
Zophai

Reputation: 104

Loop python auto_arima through several columns in a wide data format

I will preface this by saying I am in no way a Python expert but my current project demands that it be programmed in Python, so any help is appreciated. What I have is a transformed timeseries with monthly data (30 months) and 1000 + items.

I wish to run arima for each of these columns. They are not dependent on each other. In essence it's like running 1000 independent Arima analyses.

I have programmed this functionality in R by creating a list of data frames for each item and looping through the list with R's auto arima function. It was slow and clunky but got the job done.

Doing it in Python I didn't find a way to create this structure and make it workable. Instead I found some code and tried to create a loop out of it. Now, the auto_arima runs on this, but it overwrites the results and I really don't know how to make this workable.

I need to run auto_arima as the items have individual optimal P, D, Q parameters.

X is the data, structure is: index , item1, item2, item3...itemn

dict_org = {}
dict_pred = {}

for col in X:
    size = int(len(X) * 0.70)
    train, testdata = X[0:size], X[size:len(X)]
    history = [x for x in train[column]]
    predictions = list()

    for column in testdata:
        model = pm.auto_arima(history, start_p=1, start_q=1,
                      test='adf',       # use adftest to find optimal 'd'
                      max_p=3, max_q=3, # maximum p and q
                      m=1,              # frequency of series
                      d=None,           # let model determine 'd'
                      seasonal=False,   # No Seasonality
                      start_P=0, 
                      D=0, 
                      trace=True,
                      error_action='ignore',  
                      suppress_warnings=True, 
                      stepwise=True) # this works 

        output = model.predict()

        yhat = output[0]
        predictions.append(yhat)
        obs = testdata[column]
        history.append(obs)
        print("Predicted:%f, expected:%f" %(yhat, obs))

        error = mean_squared_error(testdata, predictions[:len(testdata)])
    print('Test MSE: %.3f' % error)

    dict_org.update({X[col]: testdata})
    dict_pred.update({X[col]: predictions})

    print("Item: ", X[col], "Test MSE:%f"% error)

What I want to get out is a dictionary of all the items and predictions, similar to what I get by passing R's auto arima over a list of data frames. I now keep updating the yhat as 1 observation and I am at a loss.

I would greatly appreciate the help.

Upvotes: 3

Views: 1917

Answers (1)

Giovanna Fernandes
Giovanna Fernandes

Reputation: 115

You've probably already found a solution for this by now, but I'll leave an answer in case anyone else stumbles upon it.

The auto_arima is not the model itself. It is a function to help locate the best model orders. What you would do in the case above is to assign a variable to it and access the order and seasonal order, as well as the AIC for the best model. You can create a little function to perform this part, and then use the output into the actual model.

def find_orders(ts):

    stepwise_model = pm.auto_arima(history, start_p=1, start_q=1,
                      test='adf',       # use adftest to find optimal 'd'
                      max_p=3, max_q=3, # maximum p and q
                      m=1,              # frequency of series
                      d=None,           # let model determine 'd'
                      seasonal=False,   # No Seasonality
                      start_P=0, 
                      D=0, 
                      trace=True,
                      error_action='ignore',  
                      suppress_warnings=True, 
                      stepwise=True) # this works 

    return stepwise_model.order, stepwise_model.seasonal_order

Then, you can make another function for the modeling part - say you call it fit_arima - and pass the order and seasonal orders in your model for each time series on your loop.

for column in testdata:
        order, seasonal_order = find_orders(ts)
        fit_arimax(ts, order=order, seasonal_order=seasonal_order)

Hope that helps!

Upvotes: 1

Related Questions