Reputation: 59
I have trained an XGBoost binary classifier and I would like to extract features importance for each observation I give to the model (I already have global features importance).
More specifically, I am looking for a way to determine, for each instance given to the model, which features have the most impact and make the input belong to one class or another. I would like to know something like the top 5 features which make the observation belong to some class and indications on how I should modify these 5 features so that the probability of belonging to this class decreases or increases.
For example, let’s say my model predicts whether a house costs more than 100,000 dollars (this is the positive class) based on its location, surface and number of bedrooms. I give it the following input: London, 400 square foots, 4 bedrooms and my model predicts a probability of 56% for the house to be in the positive class. I am looking for a Python module or a function that would show the most influential features for each observation.
Upvotes: 1
Views: 1882
Reputation: 6368
There are several different methods for that. You can use native importance measures from xgboost library. Check this answer: https://stackoverflow.com/a/51645066/3733974
You can also look for alternative methods. Here are two of them I can recommend:
Upvotes: 3