Christin Abel
Christin Abel

Reputation: 321

UnicodeDecodeError when using SHAP 0.42 and xgboost 2.1.1

I am trying to explain my xgboost (v 2.1.1) model with shap (v 0.42) but I get this error:

UnicodeDecodeError                        Traceback (most recent call last)
Cell In[53], line 1
----> 1 shap_explainer = shap.TreeExplainer(model)

File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:149, in Tree.__init__(self, model, data, model_output, feature_perturbation, feature_names, approximate, **deprecated_options)
    147 self.feature_perturbation = feature_perturbation
    148 self.expected_value = None
--> 149 self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
    150 self.model_output = model_output
    151 #self.model_output = self.model.model_output # this allows the TreeEnsemble to translate model outputs types by how it loads the model

File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:859, in TreeEnsemble.__init__(self, model, data, data_missing, model_output)
    857 self.original_model = model.get_booster()
    858 self.model_type = "xgboost"
--> 859 xgb_loader = XGBTreeModelLoader(self.original_model)
    860 self.trees = xgb_loader.get_trees(data=data, data_missing=data_missing)
    861 self.base_offset = xgb_loader.base_score

File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:1444, in XGBTreeModelLoader.__init__(self, xgb_model)
   1442 self.read_arr('i', 29) # reserved
   1443 self.name_obj_len = self.read('Q')
-> 1444 self.name_obj = self.read_str(self.name_obj_len)
   1445 self.name_gbm_len = self.read('Q')
   1446 self.name_gbm = self.read_str(self.name_gbm_len)

File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:1566, in XGBTreeModelLoader.read_str(self, size)
   1565 def read_str(self, size):
-> 1566     val = self.buf[self.pos:self.pos+size].decode('utf-8')
   1567     self.pos += size
   1568     return val

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x98 in position 1155: invalid start byte 

I am running:

model = XGBRegressor().fit(X_train, y_train)
explainer = shap.TreeExplainer(model)

Is this an incompatibility between the versions? Older fixes as suggested here (Getting UnicodeDecodeError when using shap on xgboost) don't work unfortunately.

I can read in the SHAP releases that version 0.45.0 "fixed XGBoost model load". Could that be it? However, if there is a fix without upgrading, I'd prefer that.

Upvotes: 0

Views: 19

Answers (0)

Related Questions