How do the loss weights work in Tensorflow?

Question

I am training a recurrent binary classifier on a significantly underrepresented target class. Let's say our target class 1 represents <1% of all the training data we have and class 0 >99%. In order to punish the model more for mispredicting the minority class I'd like to use weights in the loss function. For each minibatch, I have create a corresponding minibatch of weights where our target class gets a weight scalar >1.0 and our majority class <1.0 accordingly. For example, in the code below we used 2.0 for class 1 and 0.6 for class 2.

loss_sum = 0.0
for t, o, tw in zip(self._targets_uns, self._logits_uns, self._targets_weight_uns):
    # t -- targets tensor [batchsize x 1], tw -- weights tensor [batchsize x 1]
    # e.g. [0, 0, 0, 0, 1, 1, 0] -- [0.5, 0.5, 0.5, 0.5, 2.0, 2.0, 0.5]
    _loss = tf.losses.sigmoid_cross_entropy(t, o, weights=tw, label_smoothing=0,
                                scope="sigmoid_cross_entropy",
                                loss_collection=tf.GraphKeys.LOSSES)
    loss_sum += _loss

Once the model is trained, I check the prediction accuracy and find that it is slightly lower than the accuracy without weights. I continue experimenting trying out weight pairs of [1.4, 0.8], [1.6, 0.4], [4.0, 0.1], [3.0, 1.0], ... and so on. However, I am not getting any improvement over the unweighted training except marginal differences in 2-3% lower. Ok, maybe I misunderstood the docs for tf.losses.sigmoid_cross_entropy function.

weights acts as a coefficient for the loss. If a scalar is provided, then the loss is simply scaled by the given value. If weights is a tensor of shape [batch_size], then the loss weights apply to each corresponding sample.

I just reverse the pairs and use higher weight for class 0 and lower for class 1: [0.5, 2.0], [0.8, 1.3], [0.2, 1.0], .... This also does not provide any improvement except being slightly worse than unweighted version.

Can somebody please explain to me the behaviour of a weighted loss? Am I doing it correctly and what should I do to upweight the minority class?

bivouac0 · Accepted Answer

Weighting is a general mathematical technique used for solving an over-specified system of equations of the form Wx=y, where x in the input vector, y is the output vector and W is the transformation matrix you wish to find. Often times, these problems are solved using techniques such as SVD. SVD will find the solution for W by minimizing the least-squared error for the over-specified system. Tensorflow is basically solving a similar problem through its minimization process.

In your case, what is happening is that you have 1 sample of class A and 99 samples of class B. Because the solving process works to minimize the overall error, class B contributes to the solution by a factor of 99 to class A's 1. In order to solve this, you should adjust your weights to so that class A and B have an even contribution to the solution, ie.. weight down class B by 0.01.

More generally you can do...

ratio = num_B / (num_A + num_B)
weights = [ratio, 1.0 - ratio]

How do the loss weights work in Tensorflow?

Answers (1)

Related Questions