Math behind decision tree regression?

Question

I am trying to understand the math behind the Decision tree(Regression). I came across 2 article and both of them explain differently on how the split is done in regression tree. Can anyone point out which one is correct or both are similar just the method is different ?

Thanks,

The Mask · Accepted Answer

Both are correct. Method 1 uses standard deviation for spliiting the nodes and method 2 uses variance. Both s.d and variance are used since the target value is continuous.

Variance is one of the most commonly used splitting criteria for regression trees.

Variance
The variance is the average of the squared differences from the mean. To figure out the variance, first calculate the difference between each point and the mean; then, square and average the results.

Standard Deviation
Standard deviation is a statistic that looks at how far from the mean a group of numbers is, by using the square root of the variance. The calculation of variance uses squares because it weights outliers more heavily than data very near the mean. This calculation also prevents differences above the mean from canceling out those below, which can sometimes result in a variance of zero.

Math behind decision tree regression?

Answers (1)

Related Questions