The Great
The Great

Reputation: 7743

export SHAP waterfall plot to dataframe

I am working on a binary classification using random forest model, neural networks in which am using SHAP to explain the model predictions. I followed the tutorial and wrote the below code to get the waterfall plot shown below

row_to_show = 20
data_for_prediction = ord_test_t.iloc[row_to_show]  # use 1 row of data here. Could use multiple rows if desired
data_for_prediction_array = data_for_prediction.values.reshape(1, -1)
rf_boruta.predict_proba(data_for_prediction_array)
explainer = shap.TreeExplainer(rf_boruta)
# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)
shap.plots._waterfall.waterfall_legacy(explainer.expected_value[0], shap_values[0],ord_test_t.iloc[row_to_show])

This generated the plot as shown below

enter image description here

However, I want to export this to dataframe and how can I do it?

I expect my output to be like as shown below. I want to export this for the full dataframe. Can you help me please?

enter image description here

Upvotes: 2

Views: 2375

Answers (3)

Nikaido
Nikaido

Reputation: 4629

If I recall correctly, you can do something like this with pandas

import pandas as pd
    
shap_values = explainer.shap_values(data_for_prediction)
shap_values_df = pd.DataFrame(shap_values)

to get the feature names, you should do something like this (if data_for_prediction is a dataframe):

feature_names = data_for_prediction.columns.tolist()
shap_df = pd.DataFrame(shap_values.values, columns=feature_names)

Upvotes: 2

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25249

Let's do a small experiment:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer

X, y = load_breast_cancer(return_X_y=True)
model = RandomForestClassifier(max_depth=5, n_estimators=100).fit(X, y)
explainer = TreeExplainer(model)

What is explainer here? If you do dir(explainer) you'll find out it has some methods and attributes among which is:

explainer.expected_value

which is of interest to you because this is base on which SHAP values add up.

Furthermore:

sv = explainer.shap_values(X)
len(sv)

will give a hint sv is a list consisting of 2 objects which are most probably SHAP values for 1 and 0, which must be symmetric (because what moves towards 1 moves exactly by the same amount, but with opposite sign, towards 0).

Hence:

sv1 = sv[1]

Now you have everything to pack it to the desired format:

df = pd.DataFrame(sv1, columns=X.columns)
df.insert(0, 'bv', explainer.expected_value[1])

Q: How do I know?
A: Read docs and source code.

Upvotes: 2

Gwendal Yviquel
Gwendal Yviquel

Reputation: 392

I'm a currenty using that :

def getShapReport(classifier,X_test):
   shap_values = shap.TreeExplainer(classifier).shap_values(X_test)
   shap.summary_plot(shap_values, X_test)
   shap.summary_plot(shap_values[1], X_test)
   return pd.DataFrame(shap_values[1])

It first displays the shap values for the model, and for each prediction after that, and finally it returns the dataframe for the positive class(i'm on an imbalance context)

It is for a Tree explainer and not a waterfall, but it is basically the same.

Upvotes: 1

Related Questions