pjrockzzz
pjrockzzz

Reputation: 165

Random Forest "Feature Importance"

I am currently working on Random Forest Classifier. One of the parameters of Random Forest Classifier is "Criterion" which has 2 options : Gini or Entropy. Low value of Gini is preferred and high value of Entropy is preferred. By default, gini is criterion for Random Forest Classifier.

There is an attribute called feature_importances_ provided by sklearn, where we get the values of the attributes/features provided. By using we can select some features and eliminate some using "threshold and SelectFromModel"

My doubt is that, on what basis these feature_importances_ are calculated? Assume default criterion "Gini" is available. If I assume the feature_importances_ are "Gini Importances" then low value is preferred, but in feature importances, high values are preferred

Upvotes: 1

Views: 2080

Answers (1)

Alex Serra Marrugat
Alex Serra Marrugat

Reputation: 2042

features_importances_ always output the importance of the features. If the value is bigger, more important is the feature, don't take in consideration gini or entropy criterion, it doesn't matter. Criterion is used to build the model. Feature importance is applied after the model is trained, you only "analyze" and observe which values have been more relevant in your trained model.

Moreover, you will see that all features_importances_ sums to 1, so the importance is seen as a percentage too.

Since RandomForest is formed by several trees, feature importances are averaged over all the trees.

Upvotes: 2

Related Questions