Reputation: 23
I am currently working on a prediction problem, which i tried to solve with a scikit-learns DecisionTreeRegressor
when i came across the following issue:
When fitting a tree specifying both parameters
max_depth
andmax_leaf_nodes
, thedepth
of the resulting tree ismax_depth+1
. When fitting a tree specifying onlymax_depth
, the resulting tree has thecorrect depth
.
Could this be a mistake in the DecisionTreeRegressor class
or am i missing some common knowledge about regression trees?
I am working on a windows machine, in a python 3.7 jupyter notebook. Sklearn version is 0.20.3.
Actually i came across this working with RandomForestRegressor
, but found the same issue for DecisionTreeRegressor
.
I wrote the following simplified example, so you can try yourself. Just uncomment max_leaf_nodes=10
.
I also visualized the trees using graphviz
, which actually showed trees of different depth.
import numpy as np
from sklearn.tree import DecisionTreeRegressor
X = np.random.rand(10,4)
y = np.random.rand(10,1)
tree = DecisionTreeRegressor(max_depth = 2,
#max_leaf_nodes = 10
)
tree.fit(X,y)
print(tree.tree_.max_depth)
Thanks for any comments.
Upvotes: 2
Views: 1792
Reputation: 19885
Though it is not documented, if max_leaf_nodes
is not set, a DepthFirstTreeBuilder
will be used to fit the underlying tree object; if it is, then a BestFirstTreeBuilder
will be used; this difference results in trees of different depths being generated.
This is an implementation-specific detail, and not because of the specific characteristics of decision trees.
As an aside, I would note that the maximum number of leaf nodes also constrains the maximum depth.
Upvotes: 1