How to obtain information gain from a scikit-learn DecisionTreeClassifier?

Question

I see that DecisionTreeClassifier accepts criterion='entropy', which means that it must be using information gain as a criterion for splitting the decision tree. What I need is the information gain for each feature at the root level, when it is about to split the root node.

Peter Prettenhofer · Accepted Answer

You can only access the information gain (or gini impurity) for a feature that has been used as a split node. The attribute DecisionTreeClassifier.tree_.best_error[i] holds the entropy of the i-th node splitting on feature DecisionTreeClassifier.tree_.feature[i]. If you want the entropy of all examples that reach the i-th node look at DecisionTreeClassifier.tree_.init_error[i].

For more information see the documentation here: https://github.com/scikit-learn/scikit-learn/blob/dacfd8bd5d943cb899ed8cd423aaf11b4f27c186/sklearn/tree/_tree.pyx#L64

If you want to access the entropy for each feature (at a certain split node) - you need to modify the function find_best_split https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L713

How to obtain information gain from a scikit-learn DecisionTreeClassifier?

Answers (1)

Related Questions