Reputation: 85
When training a network with more than one branch, and therefore more than one loss, the keras description mentioned that the global loss is a weighted summation of the two partial losses, i.e. final_loss = l1*loss1 + l2*loss2
However during the training of my model consisting of two branches, and compiled with a categorical cross-entropy loss for both branches, with the option loss_weights=[1., 1.]. I expected to see the global loss as the average of two losses (since the two partial losses are equally weighted), which is not the case. I got a relatively high global loss that I could not guess how it was computed using partial losses and their weights. The following is some training values. Could anyone explain to me how was the global loss computed with these parameters? and should the sum of loss weights not exceed 1 (i.e. should I use loss_weights=[0.5, 0.5] instead?) I will be very grateful to those who could help because I have been blocked for a long time.
Epoch 2/200
26/26 [==============================] - 39s 1s/step - loss: 9.2902 -
dense_1_loss: 0.0801 - dense_2_loss: 0.0717 -
Epoch 3/200
26/26 [==============================] - 39s 1s/step - loss: 8.2261 -
dense_1_loss: 0.0251 - dense_2_loss: 0.0199 -
Epoch 4/200
26/26 [==============================] - 39s 2s/step - loss: 7.3107 -
dense_1_loss: 0.0595 - dense_2_loss: 0.0048 -
Epoch 5/200
26/26 [==============================] - 39s 1s/step - loss: 6.4586 -
dense_1_loss: 0.0560 - dense_2_loss: 0.0025 -
Epoch 6/200
26/26 [==============================] - 39s 1s/step - loss: 5.9463 -
dense_1_loss: 0.1964 - dense_2_loss: 0.0653 -
Epoch 7/200
26/26 [==============================] - 39s 1s/step - loss: 5.3730 -
dense_1_loss: 0.1722 - dense_2_loss: 0.0447 -
Epoch 8/200
26/26 [==============================] - 39s 1s/step - loss: 4.8407 -
dense_1_loss: 0.1396 - dense_2_loss: 0.0169 -
Epoch 9/200
26/26 [==============================] - 39s 1s/step - loss: 4.4465 -
dense_1_loss: 0.1614 - dense_2_loss: 0.0124 -
Epoch 10/200
26/26 [==============================] - 39s 2s/step - loss: 3.9898 -
dense_1_loss: 0.0588 - dense_2_loss: 0.0119 -
Epoch 11/200
26/26 [==============================] - 39s 1s/step - loss: 3.6347 -
dense_1_loss: 0.0302 - dense_2_loss: 0.0085 -
Upvotes: 1
Views: 493
Reputation: 3278
Correct. Global loss is weighted sum of two partial losses as
Global loss=(loss1 * weight1 + loss2 * weight2)
I have taken a keras functional model to demonstrate global loss is weighted sum of two partial losses. Please take a look at the entire code here.
Model compiled as
model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
loss=[keras.losses.BinaryCrossentropy(from_logits=True),
keras.losses.CategoricalCrossentropy(from_logits=True)],
loss_weights=[1., 0.2])
Model trained as
model.fit({'title': title_data, 'body': body_data, 'tags': tags_data},
{'priority': priority_targets, 'department': dept_targets},
epochs=2,batch_size=32)
Epoch 1/2
40/40 [==============================] - 2s 45ms/step - loss: 1.2723 - priority_loss: 0.7062 - department_loss: 2.8304
Epoch 2/2
40/40 [==============================] - 2s 46ms/step - loss: 1.2593 - priority_loss: 0.6995 - department_loss: 2.7993
Check how the weights and two loss are used to get overall loss (loss1*weight1+loss2*weight2) (0.7062*1.0+2.8304*0.2) #1.27228
Hope this helps.
Upvotes: 1