heyoma
heyoma

Reputation: 11

How to design a joint loss function with two component with the aim of minimizing the first loss but maximizing the second loss?

I'm trying to do an experiment where there're two subtasks and the aim is to reduce the error rate of the 1st task and increase the error rate of the second task at the same time.

This setting may be similar with that of multi-task learning or adversarial learning. And now my designed loss function is as follows:

total_loss = loss1 - alpha * loss2

where I just added a weight alpha to make sure that the second loss won't cover the influence of the loss1 totally.

And the result shows that, after training for a few epochs, the total loss get to be minus, and it decrease in a quite high speed. I assume that is because loss1 is already close to 0 but loss2 is still getting smaller (Increasing the error rate is much more easier than reducing that).

I have never read a paper where a minus loss is added to the original loss function, so I am wondering is that appropriate to use such a loss function, or is that a better design for my experiment setting? And Is there any paper with similar aim of optimization?

Upvotes: 1

Views: 1236

Answers (1)

Mercury
Mercury

Reputation: 4146

First, let me explain why your loss will not work and will sharply drop to the negatives.

total_loss = loss1 - alpha*loss2

You want to minimize loss1 and maximize loss2, and subsequently combined the two opposing objectives into a single total_loss.

And then you are most likely training your model/system while minimizing total_loss. As it stands, it doesn't matter what alpha you use. The theoretical absolute minimum for a typical loss (crossentropy, mse) is 0. But because of the negative term in your loss, it can be minimized to negative infinity, so you can't stop it from exploding in the negative direction.

Now that we have the explanation, we can think of potential solutions. Since the problem is that your loss tends to converge to negative infinity, we have to find some other operation whose output tends to decrease with increasing input.

If we keep it simple, we could just try using an inverse.

total_loss = loss_1 + 1 / (loss_2 + epsilon)

The above objective should try to maximize loss_2 to make 1/loss_2 near 0.

Another option could be using tanh, which is bounded at (-1, 1). Sigmoid can be used as well.

total_loss = loss_1 + 1 - tanh(loss2)
total_loss = loss_1 + 1 - sigmoid(loss2)

There are probably other better ways to do this too.

Lastly, you need to revisit some questions: Any learning problem has an end goal. What is your theoretical optimum? Is it system 1's loss is minimized to ~0 and system 2's loss to be maximized to infinity (or some large value)? Does system 2 start from a optimal position in the first place?

I believe you should also review your approach. Look into adversarial learning approaches, like GANs.

Upvotes: 1

Related Questions