Reputation: 91
I'm doing some feature induction with decision trees and would like to know the size of the tree in terms of number of nodes. How do I do that in python?
Using the stock example from sklearn's website,
x = [[0,0],[0,1]]
y = [0,1]
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifer(n_estimators = 10)
clf = clf.fit(x,y)
I can get to individual trees by something like clf[1], clf[...], but how can I determine the size of each tree in terms of total node number?
Upvotes: 2
Views: 5553
Reputation: 41
A sklearn.tree._tree.Tree
object has a node_count
property:
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
treeObj = clf.tree_
print treeObj.node_count
Upvotes: 4
Reputation: 1
Based on previous answers, the correct code for a Random Forest in scikit learn would be:
nodeNumber = sum( tree.tree_.node_count for tree in clf.estimators_ )
Upvotes: 0
Reputation: 5962
Max depth is a pretty useful metric, which I didn't find in the API so I wrote this:
def dectree_max_depth(tree):
n_nodes = tree.node_count
children_left = tree.children_left
children_right = tree.children_right
def walk(node_id):
if (children_left[node_id] != children_right[node_id]):
left_max = 1 + walk(children_left[node_id])
right_max = 1 + walk(children_right[node_id])
return max(left_max, right_max)
else: # leaf
return 1
root_node_id = 0
return walk(root_node_id)
You can use it on all trees in a forest (rf
) like this:
[dectree_max_depth(t.tree_) for t in rf.estimators_]
BSD license.
Upvotes: 0
Reputation: 1571
Like for all tree in all language :
each node return 1 + the sum of all the subtrees size.
In python, apply this method on the root :
def size(tree):
return 1 + sum([size(subtree) for subtree in tree.subtrees])
Specifically to sklearn,looking at the source code here [https://github.com/scikit-learn/scikit-learn/tree/master/sklearn]
I think that this could be tried :
nodeNumber = sum( len(tree.value) for tree in clf.estimators_ )
Upvotes: 0