Reputation: 4081
I am trying to understand the score that each selected feature has obtained to be relevant.
I have tried this so far:
classifier =
SelectFromModel(RandomForestClassifier(n_estimators = 100))
m = classifier.fit(train.drop(columns='indicator'), train.rg_risk)
X_train = train.drop(columns='indicator')
selected_feat=X_train.columns[(classifier.get_support())]
len(selected_feat)
Upvotes: 1
Views: 183
Reputation: 22021
SelectFromModel is an Embedded method: it uses algorithms that have built-in feature selection methods.
In your case, you use RandomForest to select features based on feature importance. It calculates feature importance using node impurities in each decision tree.
Passing threshold=None
, the final feature importance threshold is calculated by default as the average of all decision tree feature importance. Other possibilities are median (works the same as mean but with median) or a scaling factor to adjust median/mean ("1.25*mean"
, "1.25*median"
).
Upvotes: 1