Devang Kulshreshtha
Devang Kulshreshtha

Reputation: 43

scikit adaboost feature_importance_

how exactly does the adaboost algorithm implemented in python assigns feature importances to each feature? I am using it for feature selection and my model performs better on applying feature selection based on the values of feature_importance_ .

Upvotes: 4

Views: 7330

Answers (1)

joceratops
joceratops

Reputation: 417

The feature_importances_ is an attribute available to sklearn's adaboost algorithm when the base classifier is a decision tree. In order to understand how feature_importances_ are calculated in the adaboost algorithm, you need to first understand how it is calculated for a decision tree classifier.

Decision Tree Classifier:

The feature_importances_ will vary depending on what split criteria you choose. When the split criteria is set to be "entropy": DecisionTreeClassifier(criterion='entropy') the feature_importances_ are equivalent to the information gain of each feature. Here is a tutorial on how to compute the information gain of each feature (slide 7 in particular). When you change the split criteria the feature_importances_ are no longer equivalent to the information gain, however the steps you take to calculate it are similar to those taken in slide 7 (with the new split criteria used in place of entropy).

Ensemble Classifiers:

Now let's return to your original question of how is it determined for the adaboost algorithm. According to the docs:

This notion of importance can be extended to decision tree ensembles by simply averaging the feature importance of each tree

Upvotes: 5

Related Questions