Reputation: 318
How do we find maximum depth of Random Forest if we know the number of features ?
This is needed for regularizing random forest classifier.
Upvotes: 4
Views: 6897
Reputation: 1893
I have not thought about this before. In general the trees are non-deterministic. Instead of asking what is the maximum depth? You may want to know what would be the average depth, or what is the chance of a tree has depth 20... Anyways it is possible to calculate some bounds of the maximum depth. So either a node runs out of (a)inbag samples or (b)possible splits.
(a) If inbag samples(N) is the limiting part, one could imagine a classification tree, where all samples except one are forwarded left for each split. Then the maximum depth is N-1. This outcome is highly unlikely, but possible. The minimal depth tree, where all child nodes are equally big, then the minimal depth would be ~log2(N), e.g. 16,8,4,2,1. In practice the tree depth will be somewhere in between maximal in minimal. Settings controlling minimal node size, would reduce the depth.
(b) To check if features are limiting tree depth and you on before hand know the training set, then count how many training samples are unique. Unique samples (U) cannot be split. Do to boostrapping only ~0.63 of samples will be selected for every tree. N ~ U * 0.63. Use the rules from section (a). All unique samples could be selected during bootstrapping, but that is unlikely too.
If you do not know your training set, try to estimate how many levels (L[i]) possible could be found in each feature (i) out of d features. For categorical features the answer may given. For numeric features drawn from a real distribution, there would be as many levels as there are samples. Possible unique samples would be U = L[1] * L[2] * L[3] ... * L[d].
Upvotes: 2