Reputation: 103
I have a dataset that has multiple time series and I want my predictions on each of the time series in that group. Let me explain with an example
Month StoreName Product Sales
01/21 A Pasta 100
02/21 A Pasta 70
03/21 A Pasta 140
02/21 A Rice 30
03/21 A Rice 10
04/21 A Rice 30
03/21 B pasta 200
04/21 B pasta 30
01/21 B Rice 120
03/21 B Rice 180
04/21 B Rice 100
Now, For a given StoreName and Product what will be the sales in the upcoming months. There are few things to be noted here.
Is it possible to model multiple time series without looping because of high cardinality? Any sort of solution in python is much appreciated.
Thanks in advance.
Upvotes: 1
Views: 1895
Reputation: 2605
I expect the statsmodels
package will have what you're looking for, which appears to be predicting a numeric value based on a mix of other numeric and categorical predictor variables. You have time series data which makes this a little trickier, but as a first exploration you could encode the month of the year as it's own column, then use statsmodels
ordinary least squares model to get started with analysis:
import statsmodels.formula.api as smf
lm = smf.ols(formula='Sales ~ Month + Storename + Product', data=df)
residuals = lm.fit()
print(residuals.summary())
That will give you a nice regression table like the following, and then you can continue exploring and better incorporating the timeseries element of your data from there.
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.338
Model: OLS Adj. R-squared: 0.287
Method: Least Squares F-statistic: 6.636
Date: Thu, 25 Mar 2021 Prob (F-statistic): 1.07e-05
Time: 19:37:47 Log-Likelihood: -375.30
No. Observations: 85 AIC: 764.6
Df Residuals: 78 BIC: 781.7
Df Model: 6
Covariance Type: nonrobust
===============================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 38.6517 9.456 4.087 0.000 19.826 57.478
Product -15.4278 9.727 -1.586 0.117 -34.793 3.938
StoreName -10.0170 9.260 -1.082 0.283 -28.453 8.419
Month -4.5483 7.279 -0.625 0.534 -19.039 9.943
========================================================================
For more info, I expect the stat models documentation here and here will get you off on a good start.
Another class of models to look into are ARIMA models, though you need to make sure your data is stationary and it's harder to get in and do exploratory analysis in <5mins with ARIMA models.
Upvotes: 1