DSouthy
DSouthy

Reputation: 189

sklearn important features error when using logistic regression

The following code works using a random forest model to give me a chart showing feature importance:

from sklearn.feature_selection import SelectFromModel
import matplotlib

clf = RandomForestClassifier()
clf = clf.fit(X_train,y_train)
clf.feature_importances_  
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)

matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')

feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')

enter image description here

However I need to do the same for a logistic regression model. The following code produces an error:

from sklearn.feature_selection import SelectFromModel
import matplotlib

clf = LogisticRegression()
clf = clf.fit(X_train,y_train)
clf.feature_importances_  
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)

matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')

feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')

I get

AttributeError: 'LogisticRegression' object has no attribute 'feature_importances_'

Can someone help where I am going wrong?

Upvotes: 5

Views: 7362

Answers (1)

Inputvector
Inputvector

Reputation: 1093

Logistic regression does not have an attribute for ranking feature. If you want to visualize the coefficients that you can use to show feature importance. Basically, we assume bigger coefficents has more contribution to the model but have to be sure that the features has THE SAME SCALE otherwise this assumption is not correct. Note that, some coefficents could be negative so your plot will looks different if you want to order them like you did on your plot, you can convert them to positive.

After you fit the logistic regression model, You can visualize your coefficents:

logistic_model.fit(X,Y)
importance = logistic_model.coef_[0]
#importance is a list so you can plot it. 
feat_importances = pd.Series(importance)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')

Output will be like that:

enter image description here

Note: You can conduct some statistical test or correlation analysis on your feature to understand the contribution to the model. It depends your data type (categorical, numerical etc. ) which test you should use.

Upvotes: 7

Related Questions