Reputation: 296
I am trying to fit an xgboost model using the native pseudo-Huber loss reg:pseudohubererror
. However, it doesn't seem to be working since nor the training nor the test error is improving. It works just fine with reg:squarederror
. What am I missing?
Code:
library(xgboost)
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = X %*% c(2,3) + rnorm(n,0,1)
train = xgb.DMatrix(data = X[-n,],
label = y[-n])
test = xgb.DMatrix(data = t(as.matrix(X[n,])),
label = y[n])
watchlist = list(train = train, test = test)
xbg_test = xgb.train(data = train, objective = "reg:pseudohubererror", eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)
Result:
[1] train-mae:44.372692 test-mae:33.085709
Multiple eval metrics are present. Will use test_mae for early stopping.
Will train until test_mae hasn't improved in 100 rounds.
[2] train-mae:44.372692 test-mae:33.085709
[3] train-mae:44.372688 test-mae:33.085709
[4] train-mae:44.372688 test-mae:33.085709
[5] train-mae:44.372688 test-mae:33.085709
[6] train-mae:44.372688 test-mae:33.085709
[7] train-mae:44.372688 test-mae:33.085709
[8] train-mae:44.372688 test-mae:33.085709
[9] train-mae:44.372688 test-mae:33.085709
[10] train-mae:44.372692 test-mae:33.085709
Upvotes: 7
Views: 4940
Reputation: 65
I don`t have an answer for the "why" part of the question but had the same problem and found a solution that worked for me.
In my problem, the algorithm started converging only after I applied standardization on the label:
Label_standard = (Label - mean(Label)) / sd(Label)
Pay attention to calculate mean and sd only from training and not including the testing dataset!
After training the model and generating predictions, you need to transform the standardized predictions back to the original range with the mean and sd that you calculated from the training dataset.
I got this idea because I found that the algorithm didn`t convert when the label values were "big". I would be interested to understand the "why" as well.
Upvotes: 0
Reputation: 3335
It seems like that is the expected behavior of the pseudohuber loss. Here I hard coded the first and second derivatives of the objective loss function found here and fed it via the obj=obje
parameter. If you run it and compare with the objective="reg:pseudohubererror"
version, you'll see they are the same. As for why it is so much worse than squared loss, not sure.
set.seed(20)
obje=function(pred, dData) {
labels=getinfo(dData, "label")
a=pred
d=labels
fir=a^2/sqrt(a^2/d^2+1)/d-2*d*(sqrt(a^2/d^2+1)-1)
sec=((2*(a^2/d^2+1)^(3/2)-2)*d^2-3*a^2)/((a^2/d^2+1)^(3/2)*d^2)
return (list(grad=fir, hess=sec))
}
xbg_test = xgb.train(data = train, obj=obje, eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)
Upvotes: 2