CFD
CFD

Reputation: 686

distribution plot of feature importances

I have done a feature selection in my data frame based on this: https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f

on part 7, for ploting distrubution of importance, provides this code:

pd.series(sel.estimator_,feature_importances_,.ravel()).hist()

which I think it should be like this to not have syntax error:

pd.series(sel.estimator_,feature_importances_.ravel()).hist()

and I received this error:

AttributeError: module 'pandas' has no attribute 'series'

and I think estimator_ and feature_importances_ are not defined too. Is there any way to debug this line of code? enter image description here

Upvotes: 1

Views: 2569

Answers (1)

mujjiga
mujjiga

Reputation: 16876

pd.Series(sel.estimator_.feature_importances_.ravel()).hist()

It is "Series" not "series"

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.hist.html

Plotting Feature importance

importances = sel.estimator_.feature_importances_
indices = np.argsort(importances)[::-1]
# X is the train data used to fit the model 
plt.figure()
plt.title("Feature importances")
plt.bar(range(X.shape[1]), importances[indices],
       color="r", align="center")
plt.xticks(range(X.shape[1]), indices)
plt.xlim([-1, X.shape[1]])
plt.show()

This should render a bar graph like below where x-axis is the feature indexes and y axis is the feature importance. The features are sorted in the order of importance. enter image description here

Upvotes: 4

Related Questions