Reputation: 167
I developed a decision tree (ensemble) in Matlab by using the "fitctree"-function (link: https://de.mathworks.com/help/stats/classificationtree-class.html).
Now I want to rebuild the same ensemble in python. Therefor I am using the sklearn library with the "DecisionTreeClassifier" (link: http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).
In Matlab I defined the maximum amount of splits in each tree by setting: 'MaxNumSplits' — Maximal number of decision splits in the "fitctree"-function. So with this the amount of branch nodes can be defined.
Now as I understand the attributes of the "DecisionTreeClassifier" object, there isn't any option like this. Am I right? All I found to control the amount of nodes in each tree is the "max_leaf_nodes" which obviously controls the number of leaf nodes.
And secondly: What does "max_depth" exactly control? If it's not "None" what does the integer "max_depth = int" stand for?
I appreciate your help and suggestions. Thank you!
Upvotes: 5
Views: 5180
Reputation: 23637
As far I know there is no option to limit the total number of splits (nodes) in scikit-learn. However, you can set max_leaf_nodes
to MaxNumSplits + 1
and the result should be equivalent.
Assume our tree has n_split
split nodes and n_leaf
leaf nodes. If we split a leaf node, we turn it into a split node and add two new leaf nodes. So n_splits
and n_leafs
both increase by 1. We usually start with only the root node (n_splits=0
, n_leafs=1
) and every splits increases both numbers. In consequence, the number of leaf nodes is always n_leafs == n_splits + 1
.
As for max_depth
; the depth is how many "layers" the tree has. In other words, the depth is the maximum number of nodes between the root and the furthest leaf node. The max_depth
parameter restricts this depth. It prevents further splitting of a node if it is too far down the tree. (You can think of max_depth
as a limiting to the number of splits before a decision is made.)
Upvotes: 7