Reputation: 321
I am trying to explain my xgboost (v 2.1.1) model with shap (v 0.42) but I get this error:
UnicodeDecodeError Traceback (most recent call last)
Cell In[53], line 1
----> 1 shap_explainer = shap.TreeExplainer(model)
File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:149, in Tree.__init__(self, model, data, model_output, feature_perturbation, feature_names, approximate, **deprecated_options)
147 self.feature_perturbation = feature_perturbation
148 self.expected_value = None
--> 149 self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
150 self.model_output = model_output
151 #self.model_output = self.model.model_output # this allows the TreeEnsemble to translate model outputs types by how it loads the model
File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:859, in TreeEnsemble.__init__(self, model, data, data_missing, model_output)
857 self.original_model = model.get_booster()
858 self.model_type = "xgboost"
--> 859 xgb_loader = XGBTreeModelLoader(self.original_model)
860 self.trees = xgb_loader.get_trees(data=data, data_missing=data_missing)
861 self.base_offset = xgb_loader.base_score
File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:1444, in XGBTreeModelLoader.__init__(self, xgb_model)
1442 self.read_arr('i', 29) # reserved
1443 self.name_obj_len = self.read('Q')
-> 1444 self.name_obj = self.read_str(self.name_obj_len)
1445 self.name_gbm_len = self.read('Q')
1446 self.name_gbm = self.read_str(self.name_gbm_len)
File ~/anaconda3/envs/geoproc/lib/python3.8/site-packages/shap/explainers/_tree.py:1566, in XGBTreeModelLoader.read_str(self, size)
1565 def read_str(self, size):
-> 1566 val = self.buf[self.pos:self.pos+size].decode('utf-8')
1567 self.pos += size
1568 return val
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x98 in position 1155: invalid start byte
I am running:
model = XGBRegressor().fit(X_train, y_train)
explainer = shap.TreeExplainer(model)
Is this an incompatibility between the versions? Older fixes as suggested here (Getting UnicodeDecodeError when using shap on xgboost) don't work unfortunately.
I can read in the SHAP releases that version 0.45.0 "fixed XGBoost model load". Could that be it? However, if there is a fix without upgrading, I'd prefer that.
Upvotes: 0
Views: 19