Reputation: 479
I have a Gaussian
naive bayes
algorithm running against a dataset. What I need is to to get the feature importance (impactfulness of the features) on the target class.
Here's my code:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)
gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)
gaussian_nb.score(X_test, y_test)*100
And I tried:
importance = gaussian_nb.coefs_ # and even tried coef_
and it gives an error:
AttributeError: 'GaussianNB' object has no attribute 'coefs_'
Can someone please help me?
Upvotes: 4
Views: 11580
Reputation: 5164
The GaussianNB
does not offer an intrinsic method to evaluate feature importances. Naïve Bayes methods work by determining the conditional and unconditional probabilities associated with the features and predict the class with the highest probability. Thus, there are no coefficients computed or associated with the features you used to train the model (compare with its documentation).
That being said, there are methods that you can apply post-hoc to analyze the model after it has been trained. One of these methods is the Permutation Importance and it, conveniently, has also been implemented in scikit-learn
. With the code you provided as a base, you would use permutation_importance
the following way:
from sklearn.inspection import permutation_importance
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)
gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)
imps = permutation_importance(gaussian_nb, X_test, y_test)
print(imps.importances_mean)
Observe that the Permutation Importance is dataset dependent and you have to pass a dataset to obtain the values. This can be either the same data you used to train the model, i.e. X_train
and y_train
, or a hold-out set that you saved for evaluation, like X_test
and y_test
. The latter approach is but the superior choice in regard to generalization power.
If you want to know more about Permutation Importance as a method and how it works, then the user guide provided by scikit-learn
is definitely a good start.
Upvotes: 7
Reputation: 2019
If you have a look at the documentation, Naive Bayes does not have these attributes for feature importance. You can use get_params
method for the priors learned, but not really individual features. If you need to understand feature importance, a good solution would be to to that analysis on something like a decision tree and then implement GaussianNB
the using the most important features.
Upvotes: 0