Reputation: 1392
I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree. What I need is the information gain for each feature at the root level, when it is about to split the root node.
Upvotes: 14
Views: 17180
Reputation: 1971
You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute DecisionTreeClassifier.tree_.best_error[i]
holds the entropy of the i-th node splitting on feature DecisionTreeClassifier.tree_.feature[i]
. If you want the entropy of all examples that reach the i-th node look at DecisionTreeClassifier.tree_.init_error[i]
.
For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64
If you want to access the entropy for each feature (at a certain split node) - you need to modify the function find_best_split
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713
Upvotes: 10