Krishan Gupta
Krishan Gupta

Reputation: 3716

How are "feature_importances_" ordered in Scikit-learn's RandomForestRegressor

If I run a model (called clf in this case), I get output that looks like this. How can I tie this to the feature inputs that were used to train the classifier?

>>> clf.feature_importances_

array([ 0.01621506,  0.18275428,  0.09963659,... ])

Upvotes: 11

Views: 30295

Answers (3)

Ahmed Taha Hagag
Ahmed Taha Hagag

Reputation: 27

The order is the order of the features/attributes of your training/data set.

You can display these importance scores next to their corresponding attribute/features names as below:

attributes = list(your_data_set)

sorted(zip(clf.feature_importances_, attributes), reverse=True)

The output could be something similar:

[(0.01621506, 'feature1'),
(0.09963659, 'feature2'),
(0.18275428, 'feature3'),
...
...

Upvotes: 0

Abhishek Parida
Abhishek Parida

Reputation: 35

You may save the result in a pandas data frame as follows:

pandas.DataFrame({'col_name': clf.feature_importances_}, index=x.columns).sort_values(by='col_name', ascending=False)

By sorting it in a descending manner we get a hint to significant features.

Upvotes: 3

Krishan Gupta
Krishan Gupta

Reputation: 3716

As mentioned in the comments, it looks like the order or feature importances is the order of the "x" input variable (which I've converted from Pandas to a Python native data structure). I use this code to generate a list of types that look like this: (feature_name, feature_importance).

zip(x.columns, clf.feature_importances_)

Upvotes: 17

Related Questions