Reputation: 321
Taking cue from How to access weighting of indiviual decision trees in xgboost?. How do one calculate the weights when objective = "binary:logistic", and eta = 0.1?
My tree dump is:
booster[0]
0:[WEIGHT<3267.5] yes=1,no=2,missing=1,gain=133.327,cover=58.75
1:[CYLINDERS<5.5] yes=3,no=4,missing=3,gain=9.61229,cover=33.25
3:leaf=0.872727,cover=26.5
4:leaf=0.0967742,cover=6.75
2:[WEIGHT<3431] yes=5,no=6,missing=5,gain=4.82912,cover=25.5
5:leaf=-0.0526316,cover=3.75
6:leaf=-0.846154,cover=21.75
booster[1]
0:[DISPLACEMENT<231.5] yes=1,no=2,missing=1,gain=60.9437,cover=52.0159
1:[WEIGHT<2974.5] yes=3,no=4,missing=3,gain=6.59775,cover=31.3195
3:leaf=0.582471,cover=25.5236
4:leaf=-0,cover=5.79593
2:[MODELYEAR<78.5] yes=5,no=6,missing=5,gain=1.96045,cover=20.6964
5:leaf=-0.643141,cover=19.3965
6:leaf=-0,cover=1.2999
Upvotes: 1
Views: 1361
Reputation: 321
Actually this was practical which I have overseen earlier.
Using the above tree structure one can find the probability for each training example.
The parameter list was:
param <- list("objective" = "binary:logistic",
"eval_metric" = "logloss",
"eta" = 0.5,
"max_depth" = 2,
"colsample_bytree" = .8,
"subsample" = 0.8,
"alpha" = 1)
For the instance set in leaf booster[0], leaf: 0-3; the probability will be exp(0.872727)/(1+exp(0.872727)).
And for booster[0], leaf: 0-3 + booster[1], leaf: 0-3; the probability will be exp(0.872727+ 0.582471)/(1+exp(0.872727+ 0.582471)).
And so on as one goes on increasing number of iterations.
I matched these values with R's predicted probabilities they differ in 10^(-7), probably due to floating point curtailing of leaf quality scores.
This might not be the answer to the finding weights, but this can give a production level solution when R's trained boosted trees are used in different environment for prediction.
Any comment on this will be highly appreciated.
Upvotes: 2