CodeHunter
CodeHunter

Reputation: 2082

How to print the order of important features in Random Forest regression using python?

I am trying out to create a Random Forest regression model on one of my datasets. I need to find the order of importance of each variable along with their names as well. I have tried few things but can't achieve what I want. Below is the sample code I tried on Boston Housing dataset:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np
boston = load_boston()
rf=RandomForestRegressor(max_depth=50)
idx=range(len(boston.target))
np.random.shuffle(idx)
rf.fit(boston.data[:500], boston.target[:500])
instance=boston.data[[0,5, 10]]
print rf.predict(instance[0])
print rf.predict(instance[1])
print rf.predict(instance[2])
important_features=[]
for x,i in enumerate(rf.feature_importances_):
      important_features.append(str(x))
print 'Most important features:',', '.join(important_features)

Most important features: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

If I print this:

impor = rf.feature_importances_
impor

I get below output:

array([  3.45665230e-02,   4.58687594e-04,   5.45376404e-03,
     3.33388828e-04,   2.90936201e-02,   4.15908448e-01,
     1.04131089e-02,   7.26451301e-02,   3.51628079e-03,
     1.20860975e-02,   1.40417760e-02,   8.97546838e-03,
     3.92507707e-01])

I need to get the names associated with these values and then pick the top n out of these features.

Upvotes: 8

Views: 21305

Answers (5)

Vivek Kumar
Vivek Kumar

Reputation: 36599

First, you are using wrong name for the variable. You are using important_features. Use feature_importances_ instead. Second, it will return an array of shape [n_features,] which contains the values of the feature_importance. You need to sort them in order of those values to get the most important features. See the RandomForestRegressor documentation

Edit: Added code

important_features_dict = {}
for idx, val in enumerate(rf.feature_importances_):
    important_features_dict[idx] = val

important_features_list = sorted(important_features_dict,
                                 key=important_features_dict.get,
                                 reverse=True)

print(f'5 most important features: {important_features_list[:5]}')

This will print the index of important features in decreasing order. (First is most important, and so on)

Upvotes: 14

Abuubakry Ali
Abuubakry Ali

Reputation: 51

You can print the order like this:


importances = brf.feature_importances_

sorted_indices = np.argsort(importances)[::-1]

print(*X_train.columns[sorted_indices], sep = "\n")

Upvotes: 2

Abuubakry Ali
Abuubakry Ali

Reputation: 51

importances = rf.feature_importances_

sorted_indices = np.argsort(importances)[::-1]

sorted_indices

Upvotes: 3

Darshan Jain
Darshan Jain

Reputation: 838

By the following code, you should be able to see the features in descending order with their names as well:

Create an empty list

featureImpList= []

Run the for loop:

for feat, importance in zip(train_df.columns, clf_ggr.feature_importances_):  
    temp = [feat, importance*100]
    featureImp.append(temp)

fT_df = pd.DataFrame(featureImp, columns = ['Feature', 'Importance'])
print (fT_df.sort_values('Importance', ascending = False))

Upvotes: 0

Dennis Graham
Dennis Graham

Reputation: 1

# list of column names from original data
cols = data.columns
# feature importances from random forest fit rf
rank = rf.feature_importances_
# form dictionary of feature ranks and features
features_dict = dict(zip(np.argsort(rank),cols))
# the dictionary key are the importance rank; the values are the feature name

Upvotes: -1

Related Questions