LMGagne
LMGagne

Reputation: 1716

How to use Prophet's make_future_dataframe with multiple regressors?

make_future_dataframe seems to only produce a dataframe with date (ds) values, which in turn results in ValueError: Regressor 'var' missing from dataframe when attempting to generate forecasts when using the code below.

m = Prophet()
m.add_country_holidays(country_name='US')
m.add_regressor('var')
m.fit(df)
forecasts = m.predict(m.make_future_dataframe(periods=7))

Looking through the python docs, there doesn't seem to be any mention of how to combat this issue using Prophet. Is my only option to write additional code to lag all regressors by the period for which I want to generate forecasts (ex. take var at t-7 to produce a 7 day daily forecast)?

Upvotes: 4

Views: 9312

Answers (1)

seeiespi
seeiespi

Reputation: 3828

The issue here is that the future = m.make_future_dataframe method creates a dataset future where the only column is the ds date column. In order to predict using a model with regressors you also need columns for each regressor in the future dataset.

Using my original training data which I called regression_data, I solved this by predicting the values for the regressor variables and then filling those into a future_w_regressors dataset which was a merge of future and regression_data.

Assume you have a trained model model ready.

# List of regressors     
regressors = ['Total Minutes','Sent Emails','Banner Active']

# My data is weekly so I project out 1 year (52 weeks), this is what I want to forecast
future = model.make_future_dataframe(52, freq='W')

at this point if you run model.predict(future) you will get the error you've been getting. What we need to do is incorporate the regressors. . I merge regression_data with future so that the observations from the past are filled. As you can see, the observations looking forward are empty (towards the end of the table)

# regression_data is the dataframe I used to train the model (include all covariates)
# merge the data you used to train the model 
future_w_regressors = regression_data[regressors+['ds']].merge(future, how='outer', on='ds')
future_w_regressors

Total Minutes   Sent Emails Banner Active   ds
0   7.129552    9.241493e-03    0.0 2018-01-07
1   7.157242    8.629305e-14    0.0 2018-01-14
2   7.155367    8.629305e-14    0.0 2018-01-21
3   7.164352    8.629305e-14    0.0 2018-01-28
4   7.165526    8.629305e-14    0.0 2018-02-04
... ... ... ... ...
283 NaN NaN NaN 2023-06-11
284 NaN NaN NaN 2023-06-18
285 NaN NaN NaN 2023-06-25
286 NaN NaN NaN 2023-07-02
287 NaN NaN NaN 2023-07-09

Solution 1: Predict Regressors

For the next step I create a dataset with only the empty regressor values in it, loop through each regressor, train a naive prophet model on each, predict their values for the future date, fill those values into the empty regressors dataset and place those values into the future_w_regressors dataset.

# Get the segment for which we have no regressor values
empty_future = future_w_regressors[future_w_regressors[regressors[0]].isnull()]
only_future = empty_future[['ds']]

# Create a dictionary to hold the different independent variable forecasts 
for regressor in regressors: 
    # Prep a new training dataset
    train = regression_data[['ds',regressor]]
    train.columns = ['ds','y'] # rename the variables so they can be submitted to the prophet model

    # Train a model for this regressor 
    rmodel = Prophet()
    rmodel.weekly_seasonality = False # this is specific to my case
    rmodel.fit(train)
    regressor_predictions = rmodel.predict(only_future)

    # Replace the empty values in the empty dataset with the predicted values from the regressor model 
    empty_future[regressor] = regressor_predictions['yhat'].values
    
# Fill in the values for all regressors in the future_w_regressors dataset 
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = empty_future[regressors].values

Now the future_w_regressors table no longer has missing values

future_w_regressors

Total Minutes   Sent Emails Banner Active   ds
0   7.129552    9.241493e-03    0.000000    2018-01-07
1   7.157242    8.629305e-14    0.000000    2018-01-14
2   7.155367    8.629305e-14    0.000000    2018-01-21
3   7.164352    8.629305e-14    0.000000    2018-01-28
4   7.165526    8.629305e-14    0.000000    2018-02-04
... ... ... ... ...
283 7.161023    -1.114906e-02   0.548577    2023-06-11
284 7.156832    -1.138025e-02   0.404318    2023-06-18
285 7.150829    -5.642398e-03   0.465311    2023-06-25
286 7.146200    -2.989316e-04   0.699624    2023-07-02
287 7.145258    1.568782e-03    0.962070    2023-07-09

And I can run the predict command to get my forecasts which now extend into 2023 (original data ended in 2022):

model.predict(future_w_regressors)

    ds  trend   yhat_lower  yhat_upper  trend_lower trend_upper Banner Active   Banner Active_lower Banner Active_upper Sent Emails Sent Emails_lower   Sent Emails_upper   Total Minutes   Total Minutes_lower Total Minutes_upper additive_terms  additive_terms_lower    additive_terms_upper    extra_regressors_additive   extra_regressors_additive_lower extra_regressors_additive_upper yearly  yearly_lower    yearly_upper    multiplicative_terms    multiplicative_terms_lower  multiplicative_terms_upper  yhat
0   2018-01-07  2.118724    2.159304    2.373065    2.118724    2.118724    0.000000    0.000000    0.000000    3.681765e-04    3.681765e-04    3.681765e-04    0.076736    0.076736    0.076736    0.152302    0.152302    0.152302    0.077104    0.077104    0.077104    0.075198    0.075198    0.075198    0.0 0.0 0.0 2.271026
1   2018-01-14  2.119545    2.109899    2.327498    2.119545    2.119545    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077034    0.077034    0.077034    0.098945    0.098945    0.098945    0.077034    0.077034    0.077034    0.021911    0.021911    0.021911    0.0 0.0 0.0 2.218490
2   2018-01-21  2.120366    2.074524    2.293829    2.120366    2.120366    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077014    0.077014    0.077014    0.064139    0.064139    0.064139    0.077014    0.077014    0.077014    -0.012874   -0.012874   -0.012874   0.0 0.0 0.0 2.184506
3   2018-01-28  2.121187    2.069461    2.279815    2.121187    2.121187    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077110    0.077110    0.077110    0.050180    0.050180    0.050180    0.077110    0.077110    0.077110    -0.026931   -0.026931   -0.026931   0.0 0.0 0.0 2.171367
4   2018-02-04  2.122009    2.063122    2.271638    2.122009    2.122009    0.000000    0.000000    0.000000    3.437872e-15    3.437872e-15    3.437872e-15    0.077123    0.077123    0.077123    0.046624    0.046624    0.046624    0.077123    0.077123    0.077123    -0.030498   -0.030498   -0.030498   0.0 0.0 0.0 2.168633
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 2023-06-11  2.062645    2.022276    2.238241    2.045284    2.078576    0.025237    0.025237    0.025237    -4.441732e-04   -4.441732e-04   -4.441732e-04   0.077074    0.077074    0.077074    0.070976    0.070976    0.070976    0.101867    0.101867    0.101867    -0.030891   -0.030891   -0.030891   0.0 0.0 0.0 2.133621
284 2023-06-18  2.061211    1.975744    2.199376    2.043279    2.077973    0.018600    0.018600    0.018600    -4.533835e-04   -4.533835e-04   -4.533835e-04   0.077029    0.077029    0.077029    0.025293    0.025293    0.025293    0.095176    0.095176    0.095176    -0.069883   -0.069883   -0.069883   0.0 0.0 0.0 2.086504
285 2023-06-25  2.059778    1.951075    2.162531    2.041192    2.077091    0.021406    0.021406    0.021406    -2.247903e-04   -2.247903e-04   -2.247903e-04   0.076965    0.076965    0.076965    0.002630    0.002630    0.002630    0.098146    0.098146    0.098146    -0.095516   -0.095516   -0.095516   0.0 0.0 0.0 2.062408
286 2023-07-02  2.058344    1.953027    2.177666    2.039228    2.076373    0.032185    0.032185    0.032185    -1.190929e-05   -1.190929e-05   -1.190929e-05   0.076915    0.076915    0.076915    0.006746    0.006746    0.006746    0.109088    0.109088    0.109088    -0.102342   -0.102342   -0.102342   0.0 0.0 0.0 2.065090
287 2023-07-09  2.056911    1.987989    2.206830    2.037272    2.075110    0.044259    0.044259    0.044259    6.249949e-05    6.249949e-05    6.249949e-05    0.076905    0.076905    0.076905    0.039813    0.039813    0.039813    0.121226    0.121226    0.121226    -0.081414   -0.081414   -0.081414   0.0 0.0 0.0 2.096724
288 rows × 28 columns

Note that I trained the model for each regressor naively. However, you could optimize prediction for those independent variables if you wanted to.

Solution 2: Use last year's regressor values

Alternatively, you could just say that you don't want to compound the uncertainty of regressor forecasts on your main forecast and just want an idea of how forecasts might change for different values of the regressor. In that case you might just copy the regressor values from the last year into the missing future_w_regressors dataset. This has the added benefit of easily simulating drops or increases relative to current regressor levels:

from datetime import timedelta

last_date = regression_data.iloc[-1]['ds']
one_year_ago = last_date - timedelta(days=365) # works with data at any scale

last_year_of_regressors = regression_data.loc[regression_data['ds']>one_year_ago, regressors]

# If you want to simulate a 10% drop in levels compared to this year 
last_year_of_regressors = last_year_of_regressors * 0.9    
    
future_w_regressors.loc[future_w_regressors[regressors[0]].isnull(), regressors] = last_year_of_regressors.iloc[:len(future_w_regressors[future_w_regressors[regressors[0]].isnull()])].values

Upvotes: 2

Related Questions