whoAmI
whoAmI

Reputation: 368

Where does scikit-learn hold the decision labels of each leaf node in its tree structure?

I have trained a random forest model using scikit-learn and now I want to save its tree structures in a text file so I can use it elsewhere. According to this link a tree object consist of a number of parallel arrays each one hold some information about different nodes of the tree (ex. left child, right child, what feature it examines,...) . However there seems to be no information about the class label corresponding to each leaf node! It's even not mentioned in the examples provided in the link above.

Does anyone know where are the class labels stored in the scikit-learn decision tree structure?

Upvotes: 10

Views: 5028

Answers (1)

boot-scootin
boot-scootin

Reputation: 12515

Take a look at the docs for sklearn.tree.DecisionTreeClassifier.tree_.value:

from sklearn.datasets import load_iris
from sklearn.cross_validation import cross_val_score
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()

clf.fit(iris.data, iris.target)

print(clf.classes_)

[0, 1, 2]

print(clf.tree_.value)

[[[ 50.  50.  50.]]

 [[ 50.   0.   0.]]

 [[  0.  50.  50.]]

 [[  0.  49.   5.]]

 [[  0.  47.   1.]]

 [[  0.  47.   0.]]

 [[  0.   0.   1.]]

 [[  0.   2.   4.]]

 [[  0.   0.   3.]]

 [[  0.   2.   1.]]

 [[  0.   2.   0.]]

 [[  0.   0.   1.]]

 [[  0.   1.  45.]]

 [[  0.   1.   2.]]

 [[  0.   0.   2.]]

 [[  0.   1.   0.]]

 [[  0.   0.  43.]]]

Each row in clf.tree_.value "contains the constant prediction value of each node," (help(clf.tree_)) which corresponds index-to-index to clf.classes_.

See this answer for (barely) more details.

Upvotes: 8

Related Questions