Reputation: 22661
I am trying to get splits from decision tree created based on a single variable. Is the following the correct and safe way how to get the splits?
from sklearn.tree import DecisionTreeClassifier
import numpy as np
x = np.array([1,2,3,4,5,6,7])
y = np.array([1,0,0,1,1,0,1])
x = x.reshape(7, -1)
clf = DecisionTreeClassifier()
clf.fit(x, y)
# My splits
np.sort(clf.tree_.threshold[clf.tree_.feature == 0])
I see only 0
and -2
in clf.tree_.fature
and my understanding is that -2
represents TREE_UNDEFINED
(see here) while the other points will be leafs with some defined threshold.
Upvotes: 0
Views: 1906
Reputation: 8811
Your understanding is correct to obtain the splits, but you have the order reveresed.
-2
represents the leaf nodes as you can see from this code section while others are internal node. There is however a problem with this method that you cannot simply resconstruct the order or how these splits are placed in the final structure of the tree.
Take a look at this official example, which is a more thorough way to get the exact split and node structure.
References
Upvotes: 1