MrOhDubbs
MrOhDubbs

Reputation: 28

Shap force plot and decision plot giving wrong output for XGBClassifier model

I'm trying to deliver shap decision plots for a small subset of predictions but the outputs found by shap are different than what I'm getting when just using the model to predict even with using link = 'logit' in the call. The result of every decision plot I'm trying to produce should be greater than the expected value due to the subset I'm trying to plot. However, every single plot produced has a predicted value lower than the expected value.

I have two models that are in a minimum ensemble so I'm using a for loop to determine which model to produce a plot for. I'm having no issue creating the correct plots for the RandomForestClassifier model, but the issue is occurring for the XGB model.

rf_explainer = shap.TreeExplainer(RF_model)
xgb_explainer = shap.TreeExplainer(XGB_model)

for i in range(flagged.shape[0]):
    if flagged_preds.RF_Score[i] == flagged_preds.Ensemble_Score[i]:
        idx = flagged.index[i]
        idxstr = idx[1].astype('str') + ' -- ' + idx[2].date().strftime('%Y-%m-%d') + ' -- ' + idx[0].astype('str')
        shap_value = rf_explainer.shap_values(flagged.iloc[i,:])
        shap.decision_plot(rf_explainer.expected_value[1], shap_value[1], show=False)
        plt.savefig(f'//PathToFolder/{idxstr} -- RF.jpg', format = 'jpg', bbox_inches = 'tight', facecolor = 'white')

    if flagged_preds.XGB_Score[i] == flagged_preds.Ensemble_Score[i]:
        idx = flagged.index[i]
        idxstr = idx[1].astype('str') + ' -- ' + idx[2].date().strftime('%Y-%m-%d') + ' -- ' + idx[0].astype('str')
        shap_value = xgb_explainer.shap_values(flagged.iloc[i,:])
        shap.decision_plot(xgb_explainer.expected_value, shap_value, link = 'logit',  show=False)
        plt.savefig(f'//PathToFolder/{idxstr} -- XGB.jpg', format = 'jpg', bbox_inches = 'tight', facecolor = 'white')
    plt.close()

As mentioned before, when scoring, each observation (of the ones I'm concerned with) should have a score > .5 but that isn't what I'm seeing in my shap plots. Here is an example:

Example Decision Plot

This plot shows an output of about .1 but when scoring this observation using predict_proba, I get a value of .608

I can't really provide a reprex due to the sensitive nature of the data and I'm not sure what the underlying issue is.

Any feedback would be very welcome, Thank you.



Relevant pip freeze items:

Python 3.7.3

matplotlib==3.0.3

shap==0.30.1

xgboost==0.90

Upvotes: 1

Views: 4190

Answers (1)

f_g
f_g

Reputation: 21

I suggest making a direct comparison between your model output and your SHAP output. The second code example in Section "Changing the SHAP base value" in the SHAP Decision Plots documentation shows how to sum SHAP values to match the model output for a LightGBM model. You can use the same approach for any other model. If the summed SHAP values don't match the model output, it's not a plotting issue. The code example is copied below. Both lines should print the same value.

# The model's raw prediction for the first observation.
print(model.predict(features.iloc[[0]].values, raw_score=True)[0].round(4))

# The corresponding sum of the mean + shap values
print((expected_value + shap_values[0].sum()).round(4))

Upvotes: 2

Related Questions