Bar
Bar

Reputation: 83

A more imbalanced approach to compute_class_weight

I have a large multi-label array with numbers between 0 and 65. I'm using the following code to generate class weights:

class_weights = class_weight.compute_class_weight('balanced',np.unique(labels),labels)

Where as the labels array is the array containing numbers between 0 and 65.

I'm using this in order to fit a model with class_weight flag, the reason is because I have many examples of "0" and "1" but a low amount of > 1 examples, I wanted the model to give more weight towards the examples with the less counts. This helped alot, however, now, I can see that the model gives too much weight towards the less examples and neglected a bit the examples of highest counts (1 and 0). I'm trying to find a middle approach to this, would love some tips on how to keep going on.

Upvotes: 0

Views: 241

Answers (1)

think-maths
think-maths

Reputation: 967

This is something you can achieve in in two ways provided you have done the weight assignment correctly that is giving more weights to less occurring labels and vice versa presumably which you have already done.

  1. Reduce the number of highly occurring labels in your case 0 and 1 to a label with other labels provided it does not diminishes your dataset to big margin. However this can be more often not feasible when other less occurring labels are significantly very less and is something you can decide on
  2. Other and most plausible solution would be either oversample the less occurring labels by creating its copies or under sampling the most occurring labels

Upvotes: 0

Related Questions