Reputation: 583
Even looking through Pytorch forums I'm still not certain about this one. Let's say I'm using Pytorch DDP to train a model over 4
GPUs on the same machine.
Suppose I choose a batch size of 8
. Is the model theoretically backpropagating over 2
examples every step and the final results we see are for a model trained with a batch size of 2
, or does the model gather the gradients together at every step to get the result from each GPU and backpropagate with a batch size of 8
?
Upvotes: 0
Views: 320
Reputation: 1239
The actual batch size is the size of input you feed to each worker, in your case is 8. In other words, the BP runs every 8 examples.
A concrete code example: https://gist.github.com/sgraaf/5b0caa3a320f28c27c12b5efeb35aa4c#file-ddp_example-py-L63. This is the batch size.
Upvotes: 0