Tomas Greif
Tomas Greif

Reputation: 22661

Get split points from DecisionTreeClassifier for single numeric variable

I am trying to get splits from decision tree created based on a single variable. Is the following the correct and safe way how to get the splits?

from sklearn.tree import DecisionTreeClassifier
import numpy as np
x = np.array([1,2,3,4,5,6,7])
y = np.array([1,0,0,1,1,0,1])
x = x.reshape(7, -1)

clf = DecisionTreeClassifier()
clf.fit(x, y)

# My splits
np.sort(clf.tree_.threshold[clf.tree_.feature == 0])

I see only 0 and -2 in clf.tree_.fature and my understanding is that -2 represents TREE_UNDEFINED (see here) while the other points will be leafs with some defined threshold.

Upvotes: 0

Views: 1906

Answers (1)

Gambit1614
Gambit1614

Reputation: 8811

Your understanding is correct to obtain the splits, but you have the order reveresed. -2 represents the leaf nodes as you can see from this code section while others are internal node. There is however a problem with this method that you cannot simply resconstruct the order or how these splits are placed in the final structure of the tree.

Take a look at this official example, which is a more thorough way to get the exact split and node structure.

References

Upvotes: 1

Related Questions