Reputation: 6532
I am using a Logistic Regression (in scikit) for a binary classification problem, and am interested in being able to explain each individual prediction. To be more precise, I'm interested in predicting the probability of the positive class, and having a measure of the importance of each feature for that prediction.
Using the coefficients (Betas) as a measure of importance is generally a bad idea as answered here, but I'm yet to find a good alternative.
So far the best I have found are the following 3 options:
All options (using betas, Monte Carlo and "Leave-one-out") seem like poor solutions to me.
Actual question: What is the best way to interpret the importance of each feature, at the moment of a decision, with a linear classifier?
Quick note #1: for Random Forests this is trivial, we can simply use the prediction + bias
decomposition, as explained beautifully in this blog post. The problem here is how to do something similar with linear classifiers such as Logistic Regression.
Quick note #2: there are a number of related questions on stackoverflow (1 2 3 4 5). I have not been able to find an answer to this specific question.
Upvotes: 7
Views: 2867
Reputation: 2816
I suggest to use eli5 which already have similar things implemented.
For you question: Actual question: What is the best way to interpret the importance of each feature, at the moment of a decision, with a linear classifier?
I would say the answer come the the function show_weights()
from eli5.
Furthermore this can be implemented with many other classifiers.
For more info you can see this question in related question.
Upvotes: 0
Reputation: 1480
If you want the importance of the features for a particular decision, why not simulate the decision_function
(Which is provided by scikit-learn, so you can test whether you get the same value) step by step? The decision function for linear classifiers is simply:
intercept_ + coef_[0]*feature[0] + coef_[1]*feature[1] + ...
The importance of a feature i is then just coef_[i]*feature[i]
. Of course this is similar to looking at the magnitude of the coefficients, but since it is multiplied with the actual feature and it is also what happens under the hood it might be your best bet.
Upvotes: 2