Reputation: 352

Use of 'is_unbalance' parameter in Lightgbm

I am trying to use the 'is_unbalance' parameter in my model training for a binary classification problem where the positive class is approximately 3%. If I set the parameter 'is_unbalance', I observe that the binary log loss drops in the first iteration but then keeps on increasing. I'm noticing this behavior only if I enable this parameter 'is_unbalance'. Otherwise, there is a steady drop in log_loss. Appreciate your help on this. Thanks.

Upvotes: 7

Answers (2)

Rafa

Reputation: 684

When you do not balance the sets for such an unbalanced dataset, then obviously the objective value will always drop - and will probably reach the point of classifying all the predictions to the majority class, while having a fantastic objective value.

Balancing the classes is necessary, but it doesn't mean that you should stop on is_unbalance - you can use sample_pos_weight, have customized metric, or apply weights to your samples, like following:

WEIGHTS = y_train.value_counts(normalize = True).min() / y_train.value_counts(normalize = True)
TRAIN_WEIGHTS = pd.DataFrame(y_train.rename('old_target')).merge(WEIGHTS, how = 'left', left_on = 'old_target', right_on = WEIGHTS.index).target.values
train_data = lgb.Dataset(X_train, label=y_train, weight = TRAIN_WEIGHTS)

Also, optimizing other hyperparameters should solve the issue of increasing log_loss.

Upvotes: 4

AKSHAY KUMAR RAY

Reputation: 388

When you set is_unbalance: True, the algorithm will try to Automatically balance the weight of the dominated label (with the pos/neg fraction in train set). If you want change scale_pos_weight (it is by default 1 which mean assume both positive and negative label are equal) in case of unbalance dataset you can use following formula(based on this issue on lightgbm repository) to set it correctly. sample_pos_weight = number of negative samples / number of positive samples

Upvotes: 1

Use of &#39;is_unbalance&#39; parameter in Lightgbm

Answers (2)

Related Questions

Use of 'is_unbalance' parameter in Lightgbm