Reputation: 3716
If I run a model (called clf in this case), I get output that looks like this. How can I tie this to the feature inputs that were used to train the classifier?
>>> clf.feature_importances_
array([ 0.01621506, 0.18275428, 0.09963659,... ])
Upvotes: 11
Views: 30295
Reputation: 27
The order is the order of the features/attributes of your training/data set.
You can display these importance scores next to their corresponding attribute/features names as below:
attributes = list(your_data_set)
sorted(zip(clf.feature_importances_, attributes), reverse=True)
The output could be something similar:
[(0.01621506, 'feature1'),
(0.09963659, 'feature2'),
(0.18275428, 'feature3'),
...
...
Upvotes: 0
Reputation: 35
You may save the result in a pandas data frame as follows:
pandas.DataFrame({'col_name': clf.feature_importances_}, index=x.columns).sort_values(by='col_name', ascending=False)
By sorting it in a descending manner we get a hint to significant features.
Upvotes: 3
Reputation: 3716
As mentioned in the comments, it looks like the order or feature importances is the order of the "x" input variable (which I've converted from Pandas to a Python native data structure). I use this code to generate a list of types that look like this: (feature_name, feature_importance).
zip(x.columns, clf.feature_importances_)
Upvotes: 17