What do maskers really do in SHAP package and fit them to train or test?

Question

I have been trying to work with the shap package. I want to determine the shap values from my logistic regression model. Contrary to the TreeExplainer, the LinearExplainer requires a so-called masker. What exactly does this masker do and what is the difference between the independent and partition maskers?

Also, I am interested in the important features from the test-set. Do I then fit the masker on the training set or the test set? Below you can see a snippet of code.

model = LogisticRegression(random_state = 1)
model.fit(X_train, y_train)

masker = shap.maskers.Independent(data = X_train)
**or**
masker = shap.maskers.Independent(data = X_test)

explainer = shap.LinearExplainer(model, masker = masker)
shap_val = explainer(X_test)```

Sergey Bushmanov · Accepted Answer

Masker class provides a background data to "train" your explainer against. I.e., in:

explainer = shap.LinearExplainer(model, masker = masker)

you're using background data determined by masker (you may see what data is used by accessing masker.data attribute). You may read more about "true to model" or "true to data" explanations here or here.

Given above, calculation-wise you may do both:

masker = shap.maskers.Independent(data = X_train)

or

masker = shap.maskers.Independent(data = X_test)
explainer = shap.LinearExplainer(model, masker = masker)

but conceptually, imo the following makes more sense:

masker = shap.maskers.Independent(data = X_train)
explainer = shap.LinearExplainer(model, masker = masker)

This is akin usual train/test paradigm, where you train your model (and explainer) on train data, and try to predict (and explain) your test data.

Unrelated to the question. An alternative to masker, which samples data for you, would be to explicitly provide background that may allow comparing 2 datapoints: a point against which compare, and the point of interest, like in this notebook. In such a manner one may find out why 2 seemingly similar datapoints were classified differently.

What do maskers really do in SHAP package and fit them to train or test?

Answers (1)

Related Questions