Piyush Makhija
Piyush Makhija

Reputation: 304

Does xgBoost's relative feature importance vary with datapoints in test set?

I'm working on a binary classification dataset and applying xgBoost model to the problem. Once the model is ready, I plot the feature importance and one of the trees resulting from the underlying random forests. Please find these plots below.

enter image description here enter image description here

Questions

Upvotes: 2

Views: 1816

Answers (1)

jared_mamrot
jared_mamrot

Reputation: 26695

What do you mean by "datapoint"? Is a datapoint a single case/subject/patient/etc? If so;

  1. The feature importance plot and the tree you plotted both relate only to the model, they are independent of the test set. Finding out which features were important in categorising a specific subject/case/datapoint in the test set is a more challenging task (see e.g. XGBoostExplainer / https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211).

  2. The ordering and relative importance of each feature are different for each subject/case/datapoint (see above), and there is no 'class activation map' in xgboost - all data is analysed and data that is deemed 'not important' does not contribute final decision.

EDIT

Further example of XGBoostExplainer: example_1.png

Upvotes: 1

Related Questions