PaulG
PaulG

Reputation: 296

Why does training Xgboost model with pseudo-Huber loss return a constant test metric?

I am trying to fit an xgboost model using the native pseudo-Huber loss reg:pseudohubererror. However, it doesn't seem to be working since nor the training nor the test error is improving. It works just fine with reg:squarederror. What am I missing?

Code:

library(xgboost)
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = X %*% c(2,3) + rnorm(n,0,1)

train = xgb.DMatrix(data  = X[-n,],
                    label = y[-n])

test = xgb.DMatrix(data   = t(as.matrix(X[n,])),
                   label = y[n]) 

watchlist = list(train = train, test = test)

xbg_test = xgb.train(data = train, objective = "reg:pseudohubererror", eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)

Result:

[1] train-mae:44.372692 test-mae:33.085709 
Multiple eval metrics are present. Will use test_mae for early stopping.
Will train until test_mae hasn't improved in 100 rounds.

[2] train-mae:44.372692 test-mae:33.085709 
[3] train-mae:44.372688 test-mae:33.085709 
[4] train-mae:44.372688 test-mae:33.085709 
[5] train-mae:44.372688 test-mae:33.085709 
[6] train-mae:44.372688 test-mae:33.085709 
[7] train-mae:44.372688 test-mae:33.085709 
[8] train-mae:44.372688 test-mae:33.085709 
[9] train-mae:44.372688 test-mae:33.085709 
[10]    train-mae:44.372692 test-mae:33.085709 

Upvotes: 7

Views: 4940

Answers (2)

schmitzi89
schmitzi89

Reputation: 65

I don`t have an answer for the "why" part of the question but had the same problem and found a solution that worked for me.

In my problem, the algorithm started converging only after I applied standardization on the label:

Label_standard = (Label - mean(Label)) / sd(Label)

Pay attention to calculate mean and sd only from training and not including the testing dataset!

After training the model and generating predictions, you need to transform the standardized predictions back to the original range with the mean and sd that you calculated from the training dataset.

I got this idea because I found that the algorithm didn`t convert when the label values were "big". I would be interested to understand the "why" as well.

Upvotes: 0

Vons
Vons

Reputation: 3335

It seems like that is the expected behavior of the pseudohuber loss. Here I hard coded the first and second derivatives of the objective loss function found here and fed it via the obj=obje parameter. If you run it and compare with the objective="reg:pseudohubererror" version, you'll see they are the same. As for why it is so much worse than squared loss, not sure.

set.seed(20)

obje=function(pred, dData) {
  labels=getinfo(dData, "label")
  a=pred
  d=labels
  fir=a^2/sqrt(a^2/d^2+1)/d-2*d*(sqrt(a^2/d^2+1)-1)
  sec=((2*(a^2/d^2+1)^(3/2)-2)*d^2-3*a^2)/((a^2/d^2+1)^(3/2)*d^2)
  return (list(grad=fir, hess=sec))
}

xbg_test = xgb.train(data = train, obj=obje, eval_metric = "mae", watchlist = watchlist, gamma = 1, eta = 0.01, nrounds = 10000, early_stopping_rounds = 100)

Upvotes: 2

Related Questions