Reputation: 39
I am trying to forecast sales of products for more than 2000 products. In my data, I resample each products' sales data into weekly sales data and each product time series data behaves differently. Seasonal patterns are not obvious and that is why I decided to use auto_arima function in Python for two different conditions which assumes there is seasonality and there is not. For the seasonality case, I assumed period is 52 weeks because peaks in seasonal decomposition of data was observed same after 1 year period. Now, my question is that is it good practice to try two different conditions for auto arima function and captures the best model(ARIMA or SARIMAX) that gives lowest mse? Also, auto_arima function works very slow while it tries to find the order of sarimax model. I wil be glad to hear any advice for speeding up and my first question.
Thanks.
df_models = pd.DataFrame()
df_model_results = pd.DataFrame()
for k in range(len(df_stationary_items)):
test_df = grouped_df.get_group(df_stationary_items[k])
X = test_df['Quantity'].values
train, test = X[0:len(X)-1], X[len(X)-1:]
try:
stepwise_fit = auto_arima(test_df['Quantity'], start_p=0, start_q=0,
max_p=6, max_q=6,m=52,
start_P=0,seasonal=True,alpha=0.05,
d=None,D=None, max_D=1 ,trace=True,n_jobs=-1,
error_action='ignore',stepwise=True)
df_models =df_models.append({"ItemNo": df_stationary_items[k], "Order": stepwise_fit.order,"SeasonalOrder": stepwise_fit.seasonal_order},ignore_index=True)
model = SARIMAX(train, order=stepwise_fit.order,seasonal_order=stepwise_fit.seasonal_order)
model_fit = model.fit()
predictions = model_fit.predict(start=len(train), end=len(train)+len(test)-1, dynamic=False)
rmse= sqrt(mean_squared_error(test, predictions))
df_model_results =df_model_results.append({"ItemNo": df_stationary_items[k],"StationaryP":result[1] ,"Order": stepwise_fit.order,"SeasonalOrder": stepwise_fit.seasonal_order,"Predicted":predictions[0],"Expected":test[0],"STDEV":test_df['Quantity'].std(),"rmse":rmse},ignore_index=True)
except:
continue
df_test_results_nonseasonal = pd.DataFrame()
df_model_results_nonseasonal = pd.DataFrame()
df_models_nonseasonal=pd.DataFrame()
for m in range(len(df_stationary_items)):
test_df_nonseasonal = grouped_df.get_group(df_stationary_items[m])
X_non = test_df_nonseasonal['Quantity'].values
train_non, test_non = X_non[0:len(X_non)-1], X_non[len(X_non)-1:]
try:
stepwise_nonseasonal = auto_arima(test_df_nonseasonal['Quantity'],error_action='ignore',seasonal=False)
df_models_nonseasonal =df_models_nonseasonal.append({"ItemNo": df_stationary_items[m], "Order": stepwise_nonseasonal.order},ignore_index=True)
model_non = ARIMA(train_non, order=stepwise_nonseasonal.order)
model_fit_non = model_non.fit()
predictions_non = model_fit_non.predict(start=len(train_non), end=len(train_non)+len(test_non)-1, dynamic=False)
rmse_non= sqrt(mean_squared_error(test_non, predictions_non))
df_model_results_nonseasonal =df_model_results_nonseasonal.append({"ItemNo": df_stationary_items[m],"StationaryP":result_non[1] ,"Order": stepwise_nonseasonal.order,"Predicted":predictions_non[0],"Expected":test_non[0],"STDEV":test_df_nonseasonal['Quantity'].std(),"rmse":rmse_non},ignore_index=True)
except:
continue
Any advice for forecasting of multiple products would be great!
Upvotes: 1
Views: 1102