Lilz
Lilz

Reputation: 4081

sklearn random forest to find score of selected features

I am trying to understand the score that each selected feature has obtained to be relevant.

I have tried this so far:

 classifier = 
 SelectFromModel(RandomForestClassifier(n_estimators = 100))
 m = classifier.fit(train.drop(columns='indicator'), train.rg_risk)
 X_train = train.drop(columns='indicator')
selected_feat=X_train.columns[(classifier.get_support())]
len(selected_feat)

Upvotes: 1

Views: 183

Answers (1)

Marco Cerliani
Marco Cerliani

Reputation: 22021

SelectFromModel is an Embedded method: it uses algorithms that have built-in feature selection methods.

In your case, you use RandomForest to select features based on feature importance. It calculates feature importance using node impurities in each decision tree.

Passing threshold=None, the final feature importance threshold is calculated by default as the average of all decision tree feature importance. Other possibilities are median (works the same as mean but with median) or a scaling factor to adjust median/mean ("1.25*mean", "1.25*median").

source sklearn

Upvotes: 1

Related Questions