Reputation: 686
I have done a feature selection in my data frame based on this: https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f
on part 7, for ploting distrubution of importance, provides this code:
pd.series(sel.estimator_,feature_importances_,.ravel()).hist()
which I think it should be like this to not have syntax error:
pd.series(sel.estimator_,feature_importances_.ravel()).hist()
and I received this error:
AttributeError: module 'pandas' has no attribute 'series'
and I think estimator_ and feature_importances_ are not defined too.
Is there any way to debug this line of code?
Upvotes: 1
Views: 2569
Reputation: 16876
pd.Series(sel.estimator_.feature_importances_.ravel()).hist()
It is "Series" not "series"
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.hist.html
importances = sel.estimator_.feature_importances_
indices = np.argsort(importances)[::-1]
# X is the train data used to fit the model
plt.figure()
plt.title("Feature importances")
plt.bar(range(X.shape[1]), importances[indices],
color="r", align="center")
plt.xticks(range(X.shape[1]), indices)
plt.xlim([-1, X.shape[1]])
plt.show()
This should render a bar graph like below where x-axis is the feature indexes and y axis is the feature importance. The features are sorted in the order of importance.
Upvotes: 4