Reputation: 104
I will preface this by saying I am in no way a Python expert but my current project demands that it be programmed in Python, so any help is appreciated. What I have is a transformed timeseries with monthly data (30 months) and 1000 + items.
I wish to run arima for each of these columns. They are not dependent on each other. In essence it's like running 1000 independent Arima analyses.
I have programmed this functionality in R by creating a list of data frames for each item and looping through the list with R's auto arima function. It was slow and clunky but got the job done.
Doing it in Python I didn't find a way to create this structure and make it workable. Instead I found some code and tried to create a loop out of it. Now, the auto_arima runs on this, but it overwrites the results and I really don't know how to make this workable.
I need to run auto_arima as the items have individual optimal P, D, Q parameters.
X is the data, structure is: index , item1, item2, item3...itemn
dict_org = {}
dict_pred = {}
for col in X:
size = int(len(X) * 0.70)
train, testdata = X[0:size], X[size:len(X)]
history = [x for x in train[column]]
predictions = list()
for column in testdata:
model = pm.auto_arima(history, start_p=1, start_q=1,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=1, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True) # this works
output = model.predict()
yhat = output[0]
predictions.append(yhat)
obs = testdata[column]
history.append(obs)
print("Predicted:%f, expected:%f" %(yhat, obs))
error = mean_squared_error(testdata, predictions[:len(testdata)])
print('Test MSE: %.3f' % error)
dict_org.update({X[col]: testdata})
dict_pred.update({X[col]: predictions})
print("Item: ", X[col], "Test MSE:%f"% error)
What I want to get out is a dictionary of all the items and predictions, similar to what I get by passing R's auto arima over a list of data frames. I now keep updating the yhat as 1 observation and I am at a loss.
I would greatly appreciate the help.
Upvotes: 3
Views: 1917
Reputation: 115
You've probably already found a solution for this by now, but I'll leave an answer in case anyone else stumbles upon it.
The auto_arima is not the model itself. It is a function to help locate the best model orders. What you would do in the case above is to assign a variable to it and access the order and seasonal order, as well as the AIC for the best model. You can create a little function to perform this part, and then use the output into the actual model.
def find_orders(ts):
stepwise_model = pm.auto_arima(history, start_p=1, start_q=1,
test='adf', # use adftest to find optimal 'd'
max_p=3, max_q=3, # maximum p and q
m=1, # frequency of series
d=None, # let model determine 'd'
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True) # this works
return stepwise_model.order, stepwise_model.seasonal_order
Then, you can make another function for the modeling part - say you call it fit_arima - and pass the order and seasonal orders in your model for each time series on your loop.
for column in testdata:
order, seasonal_order = find_orders(ts)
fit_arimax(ts, order=order, seasonal_order=seasonal_order)
Hope that helps!
Upvotes: 1