Reputation: 75545
Consider the short R script below. It seems that boost.hitters$train.error
does not match up with either the raw residuals or the squared errors of the training set.
I could not find documentation on train.error
at all, so I am wondering if anyone knows what train.error
really represents here and how it is computed?
library(ISLR)
library(gbm)
set.seed(1)
Hitters=na.omit(Hitters)
Hitters$Salary = log(Hitters$Salary)
boost.hitters=gbm(Salary~.,data=Hitters, n.trees=1000,interaction.depth=4, shrinkage= 0.01)
yhat.boost=predict(boost.hitters,newdata=Hitters,n.trees=1000)
mean(boost.hitters$train.error^2)
mean(boost.hitters$train.error)
mean((yhat.boost-Hitters$Salary)^2)
Output:
[1] 0.03704581
[1] 0.1519719
[1] 0.07148612
Upvotes: 4
Views: 2447
Reputation: 75545
I asked a professor at my University.
Apparently train.error
represents the training error (that is, the MSE) after each tree is added. Thus the error I computed is equal to the training error of the last tree, so in my example:
mean((yhat.boost-Hitters$Salary)^2) == boost.hitters$train.error[1000]
Upvotes: 6