Reputation: 335
Is there a way to set different class weights for xgboost classifier? For example in sklearn RandomForestClassifier this is done by the "class_weight" parameter.
Upvotes: 21
Views: 52221
Reputation: 932
For sklearn version < 0.19
Just assign each entry of your train data its class weight. First get the class weights with class_weight.compute_class_weight
of sklearn then assign each row of the train data its appropriate weight.
I assume here that the train data has the column class
containing the class number. I assumed also that there are nb_classes
that are from 1 to nb_classes
.
from sklearn.utils import class_weight
classes_weights = list(class_weight.compute_class_weight('balanced',
np.unique(train_df['class']),
train_df['class']))
weights = np.ones(y_train.shape[0], dtype = 'float')
for i, val in enumerate(y_train):
weights[i] = classes_weights[val-1]
xgb_classifier.fit(X, y, sample_weight=weights)
Update for sklearn version >= 0.19
There is simpler solution
from sklearn.utils import class_weight
classes_weights = class_weight.compute_sample_weight(
class_weight='balanced',
y=train_df['class']
)
xgb_classifier.fit(X, y, sample_weight=classes_weights)
Upvotes: 18
Reputation: 1332
Similar to @Firas Omrane and @Pramit answer, but I think it is slightly more pythonic
from sklearn.utils import class_weight
class_weights = dict(
zip(
[0,1],
class_weight.compute_class_weight(
'balanced', classes=np.unique(train['class']), y=train['class']
),
)
)
xgb_classifier.fit(X, train['class'], sample_weight=class_weights)
Upvotes: 0
Reputation: 1121
The answers here are outdated. THe sample_weight parameter is no longer supported. Its replaced with scale_pos_weight
Rather just do scale_pos_weight = sum(negative instances) / sum(positive instances)
Upvotes: 5
Reputation: 41
from sklearn.utils.class_weight import compute_sample_weight
xgb_classifier.fit(X, y, sample_weight=compute_sample_weight("balanced", y))
Upvotes: 3
Reputation: 4504
You can alternatively use the scale_pos_weight
hyperparameter, as discussed in the XGBoost docs. The advantage of this approach is that you don't have to construct the sample weight vector, and don't have to pass in the sample weight vector at fit
time.
Upvotes: 0
Reputation: 1411
I recently ran into this problem, so thought will leave a solution I tried
from xgboost import XGBClassifier
# manually handling imbalance. Below is same as computing float(18501)/392318
on the trainig dataset.
# We are going to inversely assign the weights
weight_ratio = float(len(y_train[y_train == 0]))/float(len(y_train[y_train ==
1]))
w_array = np.array([1]*y_train.shape[0])
w_array[y_train==1] = weight_ratio
w_array[y_train==0] = 1- weight_ratio
xgc = XGBClassifier()
xgc.fit(x_df_i_p_filtered, y_train, sample_weight=w_array)
Not sure, why but the results were pretty disappointing. Hope this helps someone.
[Reference link] https://www.programcreek.com/python/example/99824/xgboost.XGBClassifier
Upvotes: 7
Reputation: 2428
when using the sklearn wrapper, there is a parameter for weight.
example:
import xgboost as xgb
exgb_classifier = xgboost.XGBClassifier()
exgb_classifier.fit(X, y, sample_weight=sample_weights_data)
where the parameter shld be array like, length N, equal to the target length
Upvotes: 13