Newbie
Newbie

Reputation: 91

How to find out the size of a sklearn decision tree?

I'm doing some feature induction with decision trees and would like to know the size of the tree in terms of number of nodes. How do I do that in python?

Using the stock example from sklearn's website,

x = [[0,0],[0,1]]
y = [0,1] 

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifer(n_estimators = 10)
clf = clf.fit(x,y)

I can get to individual trees by something like clf[1], clf[...], but how can I determine the size of each tree in terms of total node number?

Upvotes: 2

Views: 5553

Answers (4)

Tias
Tias

Reputation: 41

A sklearn.tree._tree.Tree object has a node_count property:

from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
treeObj = clf.tree_
print treeObj.node_count

Upvotes: 4

user81314
user81314

Reputation: 1

Based on previous answers, the correct code for a Random Forest in scikit learn would be:

nodeNumber = sum( tree.tree_.node_count for tree in clf.estimators_ )

Upvotes: 0

Terence Parr
Terence Parr

Reputation: 5962

Max depth is a pretty useful metric, which I didn't find in the API so I wrote this:

def dectree_max_depth(tree):
    n_nodes = tree.node_count
    children_left = tree.children_left
    children_right = tree.children_right

    def walk(node_id):
        if (children_left[node_id] != children_right[node_id]):
            left_max = 1 + walk(children_left[node_id])
            right_max = 1 + walk(children_right[node_id])
            return max(left_max, right_max)
        else: # leaf
            return 1

    root_node_id = 0
    return walk(root_node_id)

You can use it on all trees in a forest (rf) like this:

[dectree_max_depth(t.tree_) for t in rf.estimators_]

BSD license.

Upvotes: 0

Arthur Vaïsse
Arthur Vaïsse

Reputation: 1571

Like for all tree in all language :

each node return 1 + the sum of all the subtrees size.

In python, apply this method on the root :

def size(tree):
    return 1 + sum([size(subtree) for subtree in tree.subtrees])

Specifically to sklearn,looking at the source code here [https://github.com/scikit-learn/scikit-learn/tree/master/sklearn]

I think that this could be tried :

nodeNumber = sum( len(tree.value) for tree in clf.estimators_ )

Upvotes: 0

Related Questions