Reputation: 23068
When using XGBRegressor, it's possible to use the base_score
setting to set the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set.
Is it possible to achieve a similar thing using XGBClassifier, by specifying a value for every target class, when the objective
parameter is set to multi:softproba
?
E.g. computing the sum of each occurrence for each target class in the training set and normalizing by percentage of total would give us:
class pct_total
--------------------
blue 0.57
red 0.22
green 0.16
black 0.05
So that when beginning its first iteration, XGBClassifier would start with these per-class values for every data point, instead of simply starting with 1 / num_classes
for all classes.
Is it possible to achieve this?
Upvotes: 1
Views: 1111
Reputation: 12602
You can accomplish this using the parameter base_margin
. Read about it in the docs; the referenced demo is here, but it uses the native API and DMatrix
; as the docs say though, you can set base_margin
in the XGBClassifier.fit
method (with new enough xgboost).
The shape of base_margin
is expected to be (n_samples, n_classes)
; since xgboost fits multiclass models in a one-vs-rest fashion, you're providing for each sample its base score for each of the three separate GBMs. Note also that these values are in the log-odds space, so transform accordingly. Also don't forget to add base_margin
to every prediction call (now that would be nicer as a builtin that would be saved to the class...see again the linked question earlier in this paragraph).
Upvotes: 2