Reputation: 189
The following code works using a random forest model to give me a chart showing feature importance:
from sklearn.feature_selection import SelectFromModel
import matplotlib
clf = RandomForestClassifier()
clf = clf.fit(X_train,y_train)
clf.feature_importances_
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)
matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')
feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')
However I need to do the same for a logistic regression model. The following code produces an error:
from sklearn.feature_selection import SelectFromModel
import matplotlib
clf = LogisticRegression()
clf = clf.fit(X_train,y_train)
clf.feature_importances_
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)
matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')
feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')
I get
AttributeError: 'LogisticRegression' object has no attribute 'feature_importances_'
Can someone help where I am going wrong?
Upvotes: 5
Views: 7362
Reputation: 1093
Logistic regression does not have an attribute for ranking feature. If you want to visualize the coefficients that you can use to show feature importance. Basically, we assume bigger coefficents has more contribution to the model but have to be sure that the features has THE SAME SCALE otherwise this assumption is not correct. Note that, some coefficents could be negative so your plot will looks different if you want to order them like you did on your plot, you can convert them to positive.
After you fit the logistic regression model, You can visualize your coefficents:
logistic_model.fit(X,Y)
importance = logistic_model.coef_[0]
#importance is a list so you can plot it.
feat_importances = pd.Series(importance)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')
Output will be like that:
Note: You can conduct some statistical test or correlation analysis on your feature to understand the contribution to the model. It depends your data type (categorical, numerical etc. ) which test you should use.
Upvotes: 7