C4.5 decision tree: can deeps be higher in linear separable data then non-linear separable?

Question

It just come across my mind that, for example, suppose we have training data with N points in 2 dimensions. We know that we can always, naively, build a decision tree so that we can classify each data point. (probably we are overfitting, and depth can go to 2^N)

However, we know that if the data set is linear separable, then the decision tree can take an advantage, probably. Use the above data set as an example, can we determine the upper bound of depth for linear and nonlinear data set? Is it guaranteed that the upper bound of depth for linear case is smaller then nonlinear case?

Raff.Edward · Accepted Answer

suppose we have training data with N points in 2 dimensions. We know that we can always, naively, build a decision tree so that we can classify each data point.

This is not true if there are 2 points with the same features but different labels.

Decision trees make splits based off of the axis, so linearly separably doesn't necessarily reduce the number of splits you need in a tree to separate the classes.

Is it guaranteed that the upper bound of depth for linear case is smaller then nonlinear case?

No. An easy counter proof is to construct a linearly separable data set with 2*N points and N features. For class A, all feature values are negative. For class B, all feature values are positive. Let each data point have only 1 non-zero feature value.

This dataset requires splitting on every feature (and thus growing out to maximal depth) to learn, despite being linearly separable.

C4.5 decision tree: can deeps be higher in linear separable data then non-linear separable?

Answers (2)

Related Questions