Reputation: 141
Any recommended ways to make PyTorch DataLoader (torch.utils.data.DataLoader
) work in distributed environment, single machine and multiple machines? Can it be done without DistributedDataParallel
?
Upvotes: 2
Views: 4178
Reputation: 31
Maybe you need to make your question clear. DistributedDataParallel
is abbreviated as DDP
, you need to train a model with DDP
in a distributed environment. This question seems to ask how to arrange the dataset loading process for distributed training.
First of all,
data.Dataloader
is proper for both dist and non-dist training, usually, there is no need to do something on that.
But the sampling strategy varies in this two modes, you need to specify a sampler for the dataloader(the sampler
arg in data.Dataloader
), adopting torch.utils.data.distributed.DistributedSampler
is the simplest way.
Upvotes: 3