Shap force plot and decision plot giving wrong output for XGBClassifier model

Question

I'm trying to deliver shap decision plots for a small subset of predictions but the outputs found by shap are different than what I'm getting when just using the model to predict even with using link = 'logit' in the call. The result of every decision plot I'm trying to produce should be greater than the expected value due to the subset I'm trying to plot. However, every single plot produced has a predicted value lower than the expected value.

I have two models that are in a minimum ensemble so I'm using a for loop to determine which model to produce a plot for. I'm having no issue creating the correct plots for the RandomForestClassifier model, but the issue is occurring for the XGB model.

rf_explainer = shap.TreeExplainer(RF_model)
xgb_explainer = shap.TreeExplainer(XGB_model)

for i in range(flagged.shape[0]):
    if flagged_preds.RF_Score[i] == flagged_preds.Ensemble_Score[i]:
        idx = flagged.index[i]
        idxstr = idx[1].astype('str') + ' -- ' + idx[2].date().strftime('%Y-%m-%d') + ' -- ' + idx[0].astype('str')
        shap_value = rf_explainer.shap_values(flagged.iloc[i,:])
        shap.decision_plot(rf_explainer.expected_value[1], shap_value[1], show=False)
        plt.savefig(f'//PathToFolder/{idxstr} -- RF.jpg', format = 'jpg', bbox_inches = 'tight', facecolor = 'white')

    if flagged_preds.XGB_Score[i] == flagged_preds.Ensemble_Score[i]:
        idx = flagged.index[i]
        idxstr = idx[1].astype('str') + ' -- ' + idx[2].date().strftime('%Y-%m-%d') + ' -- ' + idx[0].astype('str')
        shap_value = xgb_explainer.shap_values(flagged.iloc[i,:])
        shap.decision_plot(xgb_explainer.expected_value, shap_value, link = 'logit',  show=False)
        plt.savefig(f'//PathToFolder/{idxstr} -- XGB.jpg', format = 'jpg', bbox_inches = 'tight', facecolor = 'white')
    plt.close()

As mentioned before, when scoring, each observation (of the ones I'm concerned with) should have a score > .5 but that isn't what I'm seeing in my shap plots. Here is an example:

Example Decision Plot

This plot shows an output of about .1 but when scoring this observation using predict_proba, I get a value of .608

I can't really provide a reprex due to the sensitive nature of the data and I'm not sure what the underlying issue is.

Any feedback would be very welcome, Thank you.

Relevant pip freeze items:

Python 3.7.3

matplotlib==3.0.3

shap==0.30.1

xgboost==0.90

Shap force plot and decision plot giving wrong output for XGBClassifier model

Answers (1)

Related Questions