Difference in model feature importance and SHAP summary plot

Question

I have been playing around the toy dataset to understand more about shap library and usage. I found this issue that the feature importances from the catboost regressor model is different than the features importances from the summary_plot in the shap library.

I am analyzing the feature importance from the model.feature_importances_ on X_train set and the summary plot from shap explainer on X_test set.

Here is my source code -

import catboost
from catboost import *
import shap
shap.initjs()
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

X,y = shap.datasets.boston()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Train Model
model = CatBoostRegressor(iterations=300, learning_rate=0.1, random_seed=123)
model.fit(X_train, y_train, verbose=False, plot=False)

# Compute feature importance dataframe
feat_imp_list = list(zip ( list(model.feature_importances_) , model.feature_names_) )
feature_imp_df = pd.DataFrame(sorted(feat_imp_list, key=lambda x: x[0], reverse=True) , columns = ['feature_value','feature_name'])
feature_imp_df

# Run shap explainer on X_test set
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

Why does DIS show up at rank 3 in the feature importance plot from Model but shows up at rank 7 in the summary plot from the SHAP library?

Difference in model feature importance and SHAP summary plot

Answers (1)

Related Questions