staplegun
staplegun

Reputation: 87

How does gbrt_minimize from scikit decide how many parameter splits to try

From my understanding of https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html and gradient boosting decision trees in general, I assume that for N parameters, the regressor picks a bunch of splits along each parameter, works out how applying this split partitions the data, then decides which split to pick which minimises the loss the most (for a particular quantile).

My question is, how do you decide what parameter values to split at, if your parameters are real? I was expecting to find some sort of parameter to determine how many "equally spaced" splits to make, but I only see one that determines the number of data values you'd need either side of a split to make it valid. Does that mean it somehow works this out backwards?

Thanks

Upvotes: 0

Views: 34

Answers (1)

Nick ODell
Nick ODell

Reputation: 25374

My question is, how do you decide what parameter values to split at, if your parameters are real?

It depends on what criterion the tree is using to judge the quality of a split.

I was expecting to find some sort of parameter to determine how many "equally spaced" splits to make

When the tree is deciding what splits to make, it is only considering splits which split the dataset into two parts. It never considers splits which split up the dataset into three parts, or four parts, etc.

There are two reasons for this.

First, it would increase the search space. If you have N datapoints of one variable, you have a linear number of ways to split the dataset into two subsets. But there are a quadratic number of ways to split it into three subsets.

Second, it is redundant to search these. The tree-fitting algorithm will be run recursively on each subset of the dataset, so it is possible the algorithm will split variable #1, then split it again, and end up with a three way split. It all depends on how informative each variable is relative to the others.

Upvotes: 0

Related Questions