mesllo
mesllo

Reputation: 583

When training a model over multiple GPUs on the same machine using Pytorch, how is the batch size divided?

Even looking through Pytorch forums I'm still not certain about this one. Let's say I'm using Pytorch DDP to train a model over 4 GPUs on the same machine.

Suppose I choose a batch size of 8. Is the model theoretically backpropagating over 2 examples every step and the final results we see are for a model trained with a batch size of 2, or does the model gather the gradients together at every step to get the result from each GPU and backpropagate with a batch size of 8?

Upvotes: 0

Views: 320

Answers (1)

eval
eval

Reputation: 1239

The actual batch size is the size of input you feed to each worker, in your case is 8. In other words, the BP runs every 8 examples.

A concrete code example: https://gist.github.com/sgraaf/5b0caa3a320f28c27c12b5efeb35aa4c#file-ddp_example-py-L63. This is the batch size.

Upvotes: 0

Related Questions