Alesi Rowland
Alesi Rowland

Reputation: 441

Traversal method of DecisionTreeClassifier in sklearn

This class is used to build a decision tree. A lot of values within the trees structure including tree_.value and tree_.impurity are saved as arrays without much indication to which node each value is referring to. My deductions tell me that they are doing a preorder traversal, but I have no conclusive proof that this is how each array is being constructed. Does anyone know where to find this information?

Upvotes: 0

Views: 262

Answers (1)

Juan Carlos Ramirez
Juan Carlos Ramirez

Reputation: 2129

From tree.pyx:

    def class Tree:
    """Array-based representation of a binary decision tree.
    The binary tree is represented as a number of parallel arrays. The i-th
    element of each array holds information about the node `i`. Node 0 is the
    tree's root. You can find a detailed description of all arrays in
    `_tree.pxd`. NOTE: Some of the arrays only apply to either leaves or split
    nodes, resp. In this case the values of nodes of the other type are
    arbitrary!

So node[0] refers to the root. To add a node, it uses a splitter class on the leaves, which can either split the nodes based on the leaf that has a greater impurity improvement, or a depth-first splitting. I haven't looked into the order nodes are added to the parallel, but my guess is that they are added in the order in which they are created, which would be "similar" to pre-order transversal if the tree were ordered by impurity improvement.

Upvotes: 1

Related Questions