Reputation: 679
Using Python, I am trying to predict the future sales count of a product, using historical sales data. I am also trying to predict these counts for various groups of products.
For example, my columns looks like this:
Date Sales_count Department Item Color
8/1/2018, 50, Homegoods, Hats, Red_hat
If I want to build a model that predicts the sales_count for each Department/Item/Color combo using historical data (time), what is the best model to use?
If I do Linear regression on time against sales, how do I account for various categories? Can I group them?
Would I instead use multilinear regression, treating the various categories as independent variables?
Upvotes: 1
Views: 382
Reputation: 3421
The best way I have come across in forecasting in python is using SARIMAX( Seasonal Auto Regressive Integrated Moving Average with Exogenous Variables) model in statsmodel Library. Here is the link for a very good tutorial in SARIMAX using python Also, If you are able to group the data frame according to your Department/Item?color combo, you can put them in a loop and apply the same model. May be you can create a key for each unique combination and for each key condition you can forecast the sales. For example,
df=pd.read_csv('your_file.csv')
df['key']=df['Department']+'_'+df['Item']+'_'+df['Color']
for key in df['key'].unique():
temp=df.loc[df['key']==key]#filtering only the specific group
temp=temp.groupby('Date')['Sales_count'].sum().reset_index()
#aggregating the sum of sales in that date. Ignore if not required.
#write the forecasting code here from the tutorial
Upvotes: 1