Networks with multiple outputs, how the loss is computed?

Question

When training a network with more than one branch, and therefore more than one loss, the keras description mentioned that the global loss is a weighted summation of the two partial losses, i.e. final_loss = l1*loss1 + l2*loss2

However during the training of my model consisting of two branches, and compiled with a categorical cross-entropy loss for both branches, with the option loss_weights=[1., 1.]. I expected to see the global loss as the average of two losses (since the two partial losses are equally weighted), which is not the case. I got a relatively high global loss that I could not guess how it was computed using partial losses and their weights. The following is some training values. Could anyone explain to me how was the global loss computed with these parameters? and should the sum of loss weights not exceed 1 (i.e. should I use loss_weights=[0.5, 0.5] instead?) I will be very grateful to those who could help because I have been blocked for a long time.

Epoch 2/200
26/26 [==============================] - 39s 1s/step - loss: 9.2902 - 
dense_1_loss: 0.0801 - dense_2_loss: 0.0717 -
Epoch 3/200
26/26 [==============================] - 39s 1s/step - loss: 8.2261 - 
dense_1_loss: 0.0251 - dense_2_loss: 0.0199 -
Epoch 4/200 
26/26 [==============================] - 39s 2s/step - loss: 7.3107 - 
dense_1_loss: 0.0595 - dense_2_loss: 0.0048 -
Epoch 5/200
26/26 [==============================] - 39s 1s/step - loss: 6.4586 - 
dense_1_loss: 0.0560 - dense_2_loss: 0.0025 -
Epoch 6/200
26/26 [==============================] - 39s 1s/step - loss: 5.9463 - 
dense_1_loss: 0.1964 - dense_2_loss: 0.0653 -
Epoch 7/200
26/26 [==============================] - 39s 1s/step - loss: 5.3730 - 
dense_1_loss: 0.1722 - dense_2_loss: 0.0447 -
Epoch 8/200
26/26 [==============================] - 39s 1s/step - loss: 4.8407 - 
dense_1_loss: 0.1396 - dense_2_loss: 0.0169 -
Epoch 9/200
26/26 [==============================] - 39s 1s/step - loss: 4.4465 - 
dense_1_loss: 0.1614 - dense_2_loss: 0.0124 -
Epoch 10/200
26/26 [==============================] - 39s 2s/step - loss: 3.9898 - 
dense_1_loss: 0.0588 - dense_2_loss: 0.0119 -
Epoch 11/200
26/26 [==============================] - 39s 1s/step - loss: 3.6347 - 
dense_1_loss: 0.0302 - dense_2_loss: 0.0085 -

Vishnuvardhan Janapati · Accepted Answer

Correct. Global loss is weighted sum of two partial losses as

Global loss=(loss1 * weight1 + loss2 * weight2)

I have taken a keras functional model to demonstrate global loss is weighted sum of two partial losses. Please take a look at the entire code here.

Model compiled as

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss=[keras.losses.BinaryCrossentropy(from_logits=True),
                    keras.losses.CategoricalCrossentropy(from_logits=True)],
              loss_weights=[1., 0.2])

Model trained as

model.fit({'title': title_data, 'body': body_data, 'tags': tags_data},
          {'priority': priority_targets, 'department': dept_targets},
          epochs=2,batch_size=32)

Epoch 1/2
40/40 [==============================] - 2s 45ms/step - loss: 1.2723 - priority_loss: 0.7062 - department_loss: 2.8304
Epoch 2/2
40/40 [==============================] - 2s 46ms/step - loss: 1.2593 - priority_loss: 0.6995 - department_loss: 2.7993

Check how the weights and two loss are used to get overall loss (loss1*weight1+loss2*weight2) (0.7062*1.0+2.8304*0.2) #1.27228

Hope this helps.

Networks with multiple outputs, how the loss is computed?

Answers (1)

Related Questions