Reputation: 1161
Keras uses a class_weight
parameter to deal with imbalanced datasets.
Here is what we can find in the doc:
Optional dictionary mapping class indices (integers) to a weight (float) to apply to the model's loss for the samples from this class during training. This can be useful to tell the model to "pay more attention" to samples from an under-represented class.
Does that mean that the class_weight
gives a different weights in the training error function to each class? Does it have an influence elsewhere? Is it really efficient against generalization errors, in comparison of "physically" drop instances from the most represented class?
Upvotes: 4
Views: 502
Reputation: 569
The class_weight parameter weights the loss associated with each training example proportionate to that class's underrepresentation in the training set. This prevents class imbalance during training and should render your network robust to generalization error.
I'd exercise caution when physically dropping data instances corresponding to the most represented class, however - if your network is deep and therefore has significant representational capacity, culling your dataset can lead to overfitting, and consequently poor generalization to the validation/test sets.
I would recommend using the class_weights parameter as specified in the Keras documentation. If you really are intent on dropping data instances from the most represented class, ensure that you tune your network topology to decrease the model's representational capacity(i.e. add Dropout and/or L2 regularization layers).
Upvotes: 4