Reputation: 2817
At this line, every loss for each tower in the multi GPU is calculated
However, these losses are not averaged, and it seems like the loss from the last GPU is used to return loss.
Is this on purpose (if yes, why?) or is it a bug in the code?
Upvotes: 1
Views: 412
Reputation: 5771
At this line, note that loss is in different name scopes (tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i))); so if I understand correctly, it is not that only the loss for the last GPU is used; instead, all losses under a corresponding naming scope for each GPU are used.
Each tower (corresponding to each GPU) will have a loss, which is used to calculate the gradient. Losses are not averaged; instead, all gradients for all towers are averaged at line 196.
Note that in this figure from the tutorial, there is no aggregation for all individual loss, it is the gradients that are averaged.
Upvotes: 2