Markus Steinmaßl
Markus Steinmaßl

Reputation: 23

sklearn.tree.DecisionTreeRegressor: depth of tree is bigger than specified when max_leaf_nodes != None

I am currently working on a prediction problem, which i tried to solve with a scikit-learns DecisionTreeRegressor when i came across the following issue:

When fitting a tree specifying both parameters max_depth and max_leaf_nodes, the depth of the resulting tree is max_depth+1. When fitting a tree specifying only max_depth, the resulting tree has the correct depth.

Could this be a mistake in the DecisionTreeRegressor class or am i missing some common knowledge about regression trees?

I am working on a windows machine, in a python 3.7 jupyter notebook. Sklearn version is 0.20.3. Actually i came across this working with RandomForestRegressor, but found the same issue for DecisionTreeRegressor.

I wrote the following simplified example, so you can try yourself. Just uncomment max_leaf_nodes=10.

I also visualized the trees using graphviz, which actually showed trees of different depth.

import numpy as np
from sklearn.tree import DecisionTreeRegressor

X = np.random.rand(10,4)
y = np.random.rand(10,1)


tree = DecisionTreeRegressor(max_depth = 2,
                             #max_leaf_nodes = 10 
                             )
tree.fit(X,y)

print(tree.tree_.max_depth)

Thanks for any comments.

Upvotes: 2

Views: 1792

Answers (1)

gmds
gmds

Reputation: 19885

Though it is not documented, if max_leaf_nodes is not set, a DepthFirstTreeBuilder will be used to fit the underlying tree object; if it is, then a BestFirstTreeBuilder will be used; this difference results in trees of different depths being generated.

This is an implementation-specific detail, and not because of the specific characteristics of decision trees.

As an aside, I would note that the maximum number of leaf nodes also constrains the maximum depth.

Upvotes: 1

Related Questions