BeCurious
BeCurious

Reputation: 167

Determine the amount of splits in a decision tree of sklearn

I developed a decision tree (ensemble) in Matlab by using the "fitctree"-function (link: https://de.mathworks.com/help/stats/classificationtree-class.html).

Now I want to rebuild the same ensemble in python. Therefor I am using the sklearn library with the "DecisionTreeClassifier" (link: http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

In Matlab I defined the maximum amount of splits in each tree by setting: 'MaxNumSplits' — Maximal number of decision splits in the "fitctree"-function. So with this the amount of branch nodes can be defined.

Now as I understand the attributes of the "DecisionTreeClassifier" object, there isn't any option like this. Am I right? All I found to control the amount of nodes in each tree is the "max_leaf_nodes" which obviously controls the number of leaf nodes.

And secondly: What does "max_depth" exactly control? If it's not "None" what does the integer "max_depth = int" stand for?

I appreciate your help and suggestions. Thank you!

Upvotes: 5

Views: 5180

Answers (1)

MB-F
MB-F

Reputation: 23637

As far I know there is no option to limit the total number of splits (nodes) in scikit-learn. However, you can set max_leaf_nodes to MaxNumSplits + 1 and the result should be equivalent.

Assume our tree has n_split split nodes and n_leaf leaf nodes. If we split a leaf node, we turn it into a split node and add two new leaf nodes. So n_splits and n_leafs both increase by 1. We usually start with only the root node (n_splits=0, n_leafs=1) and every splits increases both numbers. In consequence, the number of leaf nodes is always n_leafs == n_splits + 1.

As for max_depth; the depth is how many "layers" the tree has. In other words, the depth is the maximum number of nodes between the root and the furthest leaf node. The max_depth parameter restricts this depth. It prevents further splitting of a node if it is too far down the tree. (You can think of max_depth as a limiting to the number of splits before a decision is made.)

Upvotes: 7

Related Questions