MLRookie33
MLRookie33

Reputation: 1

How can I use Prophet for per-location forecasting given overall sales data

I am using Prophet for Sales forecasting, and I have a several CSVs. Most of the represent sales data by date for a specific location (e.g. "Location1.CSV has "Jan 1, 2010, X widgets sold", etc.)

There's a master CSV which aggregates sales across all locations. I have used Prophet to forecast Sales across all locations and that works well, but the per-location data is very variable.

I'm seeing much higher Mean Average Errors (MAE) for per-store forecasts while the overall model has much lower MAE.

Is there any way I can use the overall Sales model to try to predict per-location sales? Or any alternatives to forecasting per-location Sales besides just using the raw sales data for that location?

Upvotes: 0

Views: 524

Answers (1)

queise
queise

Reputation: 2416

Yes, you can use your overall sales model to help predict the per-location sales in Prophet using the add_regressor method.

Let's first create a sample df, where y is the variable we want to predict (per-location sales) and overalls are the overall sales:

import pandas as pd
df = pd.DataFrame(pd.date_range(start="2019-09-01", end="2019-09-30", freq='D', name='ds'))
df["y"] = range(1,31)
df["overalls"] = range(101,131)
df.head()
            ds  y   overalls
0   2019-09-01  1   101
1   2019-09-02  2   102
2   2019-09-03  3   103
3   2019-09-04  4   104
4   2019-09-05  5   105

and split train and test:

df_train = df.loc[df["ds"]<"2019-09-21"]
df_test  = df.loc[df["ds"]>="2019-09-21"]

Before training the forecaster, we can add regressors that use additional variables. Here the argument of add_regressor is the column name of the additional variable in the training df.

from fbprophet import Prophet
m = Prophet()
m.add_regressor('overalls')
m.fit(df_train)

The predict method will then use the additional variables to forecast:

forecast = m.predict(df_test.drop(columns="y"))

Note that the additional variables should have values for your future (test) data. As you initially don't have the future overall sales, you could start by predicting overalls with univariate timeseries, and then predict y with add_regressor and the predicted overalls as future values of the additional variable.

See also this notebook, with an example of using weather factors as extra regressors in a forecast of bicycle usage.

Upvotes: 1

Related Questions