Aakash Parsi
Aakash Parsi

Reputation: 103

Forecasting each time series from a group of time series

I have a dataset that has multiple time series and I want my predictions on each of the time series in that group. Let me explain with an example

    Month StoreName Product Sales
    01/21     A      Pasta   100
    02/21     A      Pasta   70
    03/21     A      Pasta   140
    02/21     A      Rice    30
    03/21     A      Rice    10
    04/21     A      Rice    30
    03/21     B      pasta   200
    04/21     B      pasta   30
    01/21     B      Rice    120
    03/21     B      Rice    180
    04/21     B      Rice    100

Now, For a given StoreName and Product what will be the sales in the upcoming months. There are few things to be noted here.

  1. Intermittent demand (example: missing sales in February for store B)
  2. Encoding the categorical variables which have 200+ products, 200+ stores.
  3. Time series model for each set. ([A, Pasta], [A, Rice], [B, Pasta], [B, Rice])

Is it possible to model multiple time series without looping because of high cardinality? Any sort of solution in python is much appreciated.

Thanks in advance.

Upvotes: 1

Views: 1895

Answers (1)

BLimitless
BLimitless

Reputation: 2605

I expect the statsmodels package will have what you're looking for, which appears to be predicting a numeric value based on a mix of other numeric and categorical predictor variables. You have time series data which makes this a little trickier, but as a first exploration you could encode the month of the year as it's own column, then use statsmodels ordinary least squares model to get started with analysis:

import statsmodels.formula.api as smf

lm = smf.ols(formula='Sales ~ Month + Storename + Product', data=df)
residuals = lm.fit()
print(residuals.summary())

That will give you a nice regression table like the following, and then you can continue exploring and better incorporating the timeseries element of your data from there.

 OLS Regression Results                            
==============================================================================
Dep. Variable:                Sales   R-squared:                       0.338
Model:                            OLS   Adj. R-squared:                  0.287
Method:                 Least Squares   F-statistic:                     6.636
Date:                Thu, 25 Mar 2021   Prob (F-statistic):           1.07e-05
Time:                        19:37:47   Log-Likelihood:                -375.30
No. Observations:                  85   AIC:                             764.6
Df Residuals:                      78   BIC:                             781.7
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept      38.6517      9.456      4.087      0.000      19.826      57.478
Product        -15.4278     9.727     -1.586      0.117     -34.793       3.938
StoreName      -10.0170     9.260     -1.082      0.283     -28.453       8.419
Month          -4.5483      7.279     -0.625      0.534     -19.039       9.943

========================================================================

For more info, I expect the stat models documentation here and here will get you off on a good start.

Another class of models to look into are ARIMA models, though you need to make sure your data is stationary and it's harder to get in and do exploratory analysis in <5mins with ARIMA models.

Upvotes: 1

Related Questions