user13456401
user13456401

Reputation: 479

How to get the feature importance in Gaussian Naive Bayes

I have a Gaussian naive bayes algorithm running against a dataset. What I need is to to get the feature importance (impactfulness of the features) on the target class.

Here's my code:

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)
gaussian_nb.score(X_test, y_test)*100

And I tried:

importance = gaussian_nb.coefs_ # and even tried coef_

and it gives an error:

AttributeError: 'GaussianNB' object has no attribute 'coefs_'

Can someone please help me?

Upvotes: 4

Views: 11580

Answers (2)

afsharov
afsharov

Reputation: 5164

The GaussianNB does not offer an intrinsic method to evaluate feature importances. Naïve Bayes methods work by determining the conditional and unconditional probabilities associated with the features and predict the class with the highest probability. Thus, there are no coefficients computed or associated with the features you used to train the model (compare with its documentation).

That being said, there are methods that you can apply post-hoc to analyze the model after it has been trained. One of these methods is the Permutation Importance and it, conveniently, has also been implemented in scikit-learn. With the code you provided as a base, you would use permutation_importance the following way:

from sklearn.inspection import permutation_importance
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, y_train)

imps = permutation_importance(gaussian_nb, X_test, y_test)
print(imps.importances_mean)

Observe that the Permutation Importance is dataset dependent and you have to pass a dataset to obtain the values. This can be either the same data you used to train the model, i.e. X_train and y_train, or a hold-out set that you saved for evaluation, like X_test and y_test. The latter approach is but the superior choice in regard to generalization power.

If you want to know more about Permutation Importance as a method and how it works, then the user guide provided by scikit-learn is definitely a good start.

Upvotes: 7

nickyfot
nickyfot

Reputation: 2019

If you have a look at the documentation, Naive Bayes does not have these attributes for feature importance. You can use get_params method for the priors learned, but not really individual features. If you need to understand feature importance, a good solution would be to to that analysis on something like a decision tree and then implement GaussianNB the using the most important features.

Upvotes: 0

Related Questions