Reputation: 67
I understand how gradient boosting works for regression when we build the next model on the residual error of the previous model - if we use for example linear regression then it will be the residual errror as the target of the next model then sums all the models at the end to get a strong leaner
But how is this done in gradient boosted classification trees? Lets say we have a binary classification model with outcome 0/1 - what is the residual error for the next model to be trained on? And how is it calculated because it will not be y minus y predicted as is the case in linear regression.
I am really stuck on this one! The error of one binary classification tree is the ones it missclassifies - so is the target for the next model the missclasified points only?
Upvotes: 3
Views: 509
Reputation: 1068
binary classification can be posed as a regression problem of predicting the probability, such as P(y=1 | x), where y is class-label. you can use log-loss (logistic loss) instead of a squared loss for this to work.
Upvotes: 0