Reputation: 304
I'm working on a binary classification dataset and applying xgBoost model to the problem. Once the model is ready, I plot the feature importance and one of the trees resulting from the underlying random forests. Please find these plots below.
Questions
Upvotes: 2
Views: 1816
Reputation: 26695
What do you mean by "datapoint"? Is a datapoint a single case/subject/patient/etc? If so;
The feature importance plot and the tree you plotted both relate only to the model, they are independent of the test set. Finding out which features were important in categorising a specific subject/case/datapoint in the test set is a more challenging task (see e.g. XGBoostExplainer / https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211).
The ordering and relative importance of each feature are different for each subject/case/datapoint (see above), and there is no 'class activation map' in xgboost - all data is analysed and data that is deemed 'not important' does not contribute final decision.
Further example of XGBoostExplainer:
Upvotes: 1