[Tensorflow]: cifar10_multi_gpu_train.py - unintended loss reporting

Question

cifar10_multi_gpu_train.py

At this line, every loss for each tower in the multi GPU is calculated

However, these losses are not averaged, and it seems like the loss from the last GPU is used to return loss.

Is this on purpose (if yes, why?) or is it a bug in the code?

Yao Zhang · Accepted Answer

At this line, note that loss is in different name scopes (tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i))); so if I understand correctly, it is not that only the loss for the last GPU is used; instead, all losses under a corresponding naming scope for each GPU are used.

Each tower (corresponding to each GPU) will have a loss, which is used to calculate the gradient. Losses are not averaged; instead, all gradients for all towers are averaged at line 196.

Note that in this figure from the tutorial, there is no aggregation for all individual loss, it is the gradients that are averaged.

[Tensorflow]: cifar10_multi_gpu_train.py - unintended loss reporting

cifar10_multi_gpu_train.py

Answers (1)

Related Questions