Loop python auto_arima through several columns in a wide data format

Question

I will preface this by saying I am in no way a Python expert but my current project demands that it be programmed in Python, so any help is appreciated. What I have is a transformed timeseries with monthly data (30 months) and 1000 + items.

I wish to run arima for each of these columns. They are not dependent on each other. In essence it's like running 1000 independent Arima analyses.

I have programmed this functionality in R by creating a list of data frames for each item and looping through the list with R's auto arima function. It was slow and clunky but got the job done.

Doing it in Python I didn't find a way to create this structure and make it workable. Instead I found some code and tried to create a loop out of it. Now, the auto_arima runs on this, but it overwrites the results and I really don't know how to make this workable.

I need to run auto_arima as the items have individual optimal P, D, Q parameters.

X is the data, structure is: index , item1, item2, item3...itemn

dict_org = {}
dict_pred = {}

for col in X:
    size = int(len(X) * 0.70)
    train, testdata = X[0:size], X[size:len(X)]
    history = [x for x in train[column]]
    predictions = list()

    for column in testdata:
        model = pm.auto_arima(history, start_p=1, start_q=1,
                      test='adf',       # use adftest to find optimal 'd'
                      max_p=3, max_q=3, # maximum p and q
                      m=1,              # frequency of series
                      d=None,           # let model determine 'd'
                      seasonal=False,   # No Seasonality
                      start_P=0, 
                      D=0, 
                      trace=True,
                      error_action='ignore',  
                      suppress_warnings=True, 
                      stepwise=True) # this works 

        output = model.predict()

        yhat = output[0]
        predictions.append(yhat)
        obs = testdata[column]
        history.append(obs)
        print("Predicted:%f, expected:%f" %(yhat, obs))

        error = mean_squared_error(testdata, predictions[:len(testdata)])
    print('Test MSE: %.3f' % error)

    dict_org.update({X[col]: testdata})
    dict_pred.update({X[col]: predictions})

    print("Item: ", X[col], "Test MSE:%f"% error)

What I want to get out is a dictionary of all the items and predictions, similar to what I get by passing R's auto arima over a list of data frames. I now keep updating the yhat as 1 observation and I am at a loss.

I would greatly appreciate the help.

Loop python auto_arima through several columns in a wide data format

Answers (1)

Related Questions