BeginnersMindTruly
BeginnersMindTruly

Reputation: 731

Converting XGBoost Shapely values to SHAP's Explanation object

I am trying to convert XGBoost shapely values into an SHAP explainer object. Using the example [here][1] with the built in SHAP library takes days to run (even on a subsampled dataset) while the XGBoost library takes a few minutes. However. I would like to output a beeswarm graph that's similar to what's displayed in the example [here][2].

My thought was that I could use the XGBoost library to recover the shapely values and then plot them using the SHAP library, but the beeswarm plot requires an explainer object. How can I convert my XGBoost booster object into an explainer object?

Here's what I tried:

import shap
booster = model.get_booster()
d_test = xgboost.DMatrix(X_test[0:100], y_test[0:100])
shap_values = booster.predict(d_test, pred_contribs=True)
shap.plots.beeswarm(shap_values)

Which returns:

TypeError: The beeswarm plot requires an `Explanation` object as the `shap_values` argument.

To clarify, I would like to create the explainer object out of values generated by the xgboost built-in library, if possible. Avoiding the shap.explainer or shap.TreeExplainer function calls is a priority because they take much much longer (days) to return rather than minutes. [1]: https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Python%20Version%20of%20Tree%20SHAP.html [2]: https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/beeswarm.html#A-simple-beeswarm-summary-plot

Upvotes: 2

Views: 1872

Answers (1)

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25249

If you're after building an Explanation object (rather than Explainer like you stated in your question), then you can do the following:

import xgboost as xgb
import shap
from sklearn.model_selection import train_test_split

X, y = shap.datasets.california()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

d_train = xgb.DMatrix(X_train, y_train)
d_test = xgb.DMatrix(X_test, y_test)

params = {"objective": "reg:squarederror", "tree_method": "hist", "device":"cuda"}

model = xgb.train(params, d_train, 100)
shap_values = model.predict(d_test, pred_contribs=True)

exp = shap.Explanation(shap_values[:,:-1], data = X_test, feature_names=X.columns)
shap.summary_plot(exp)

enter image description here

Upvotes: 1

Related Questions