Fiction
Fiction

Reputation: 335

XGboost python - classifier class weight option?

Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.

Upvotes: 21

Views: 52221

Answers (7)

Firas Omrane
Firas Omrane

Reputation: 932

For sklearn version < 0.19

Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight of sklearn then assign each row of the train data its appropriate weight.

I assume here that the train data has the column class containing the class number. I assumed also that there are nb_classes that are from 1 to nb_classes.

from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
                                             np.unique(train_df['class']),
                                             train_df['class']))

weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
    weights[i] = classes_weights[val-1]

xgb_classifier.fit(X, y, sample_weight=weights)

Update for sklearn version >= 0.19

There is simpler solution

from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
    class_weight='balanced',
    y=train_df['class']
)

xgb_classifier.fit(X, y, sample_weight=classes_weights)

Upvotes: 18

skibee
skibee

Reputation: 1332

Similar to @Firas Omrane and @Pramit answer, but I think it is slightly more pythonic


    from sklearn.utils import class_weight
    class_weights = dict(
            zip(
                [0,1],
                class_weight.compute_class_weight(
                    'balanced', classes=np.unique(train['class']), y=train['class']
                ),
            )
        ) 
    
    xgb_classifier.fit(X, train['class'], sample_weight=class_weights)

Upvotes: 0

SriK
SriK

Reputation: 1121

The answers here are outdated. THe sample_weight parameter is no longer supported. Its replaced with scale_pos_weight

Rather just do scale_pos_weight = sum(negative instances) / sum(positive instances)

Upvotes: 5

Tianhuang Su
Tianhuang Su

Reputation: 41

from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))

Upvotes: 3

skeller88
skeller88

Reputation: 4504

You can alternatively use the scale_pos_weight hyperparameter, as discussed in the XGBoost docs. The advantage of this approach is that you don't have to construct the sample weight vector, and don't have to pass in the sample weight vector at fit time.

Upvotes: 0

Pramit
Pramit

Reputation: 1411

I recently ran into this problem, so thought will leave a solution I tried

from xgboost import XGBClassifier

# manually handling imbalance. Below is same as computing float(18501)/392318 
on the trainig dataset.
# We are going to inversely assign the weights
weight_ratio = float(len(y_train[y_train == 0]))/float(len(y_train[y_train == 
1]))
w_array = np.array([1]*y_train.shape[0])
w_array[y_train==1] = weight_ratio
w_array[y_train==0] = 1- weight_ratio

xgc = XGBClassifier()
xgc.fit(x_df_i_p_filtered, y_train, sample_weight=w_array)

Not sure, why but the results were pretty disappointing. Hope this helps someone.

[Reference link] https://www.programcreek.com/python/example/99824/xgboost.XGBClassifier

Upvotes: 7

epattaro
epattaro

Reputation: 2428

when using the sklearn wrapper, there is a parameter for weight.

example:

import xgboost as xgb
exgb_classifier = xgboost.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)

where the parameter shld be array like, length N, equal to the target length

Upvotes: 13

Related Questions