Reputation: 6538
I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.
loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
# t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
# e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
_loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
scope="sigmoid_cross_entropy",
loss_collection=tf.GraphKeys.LOSSES)
loss_sum += _loss
Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ...
and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.
weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.
I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], ...
. This also does not provide any improvement except being slightly worse than unweighted version.
Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?
Upvotes: 3
Views: 6177
Reputation: 2560
Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y
, where x
in the input vector, y
is the output vector and W
is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W
by minimizing the least-squared error
for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.
In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.
More generally you can do...
ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]
Upvotes: 5