Reputation: 16607
When our batch size is 1 or 2 and we have 8 GPUs, how torch.distributed.launch
assign data to each GPUs? I converted my model to torch.nn.parallel.DistributedDataParallel
,
model = DistributedDataParallel(model,
device_ids=[args.local_rank],
output_device=args.local_rank,
find_unused_parameters=True,
)
but it stated in the documentation that DistributedDataParallel:
parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension.
My question is when batch size is smaller than the number of GPUs how it deal with it?
Upvotes: 1
Views: 1240
Reputation: 4826
They don't. Unlike Dataparallel
, the batch size you set is per-GPU. When you have 8 GPUs with batch size 1, you have an effective batch size of 8.
Upvotes: 1