Mikhail_Sam
Mikhail_Sam

Reputation: 11208

Pytorch: Loading sample of images using DataLoader

I use standard DataLoader from torch.utils.data. I create dataset class and then build DataLoader this way:

train_dataset = LandmarksDataset(os.path.join(args.data, 'train'), train_transforms, split="train")
train_dataloader = data.DataLoader(train_dataset, batch_size=args.batch_size, num_workers=2,
                                   pin_memory=True, shuffle=True, drop_last=True)

It works perfect, but dataset is big enough - 300k of images. So it takes a lot of time for reading images on using DataLoader. So it is really wretchedly to build such big DataLoader on debug stage! I just want to test some my hypothesis and want to do it fast! I don't need to load whole dataset for this.

I'm trying to find the way How to load just a small fixed part of dataset without building dataLoader on whole dataset? At current moment all my ideas are just create another folder, copy some part of images here and use pipeline on it. But I suppose, Pytorch is clever enough to have some builtin methods for loading just a part of images from big dataset. Can you give me advice how to?

Upvotes: 0

Views: 580

Answers (1)

Vlad Sirbu
Vlad Sirbu

Reputation: 138

As far as I am aware there's no mechanism that does this for you. Your problem is in the LandmarksDataset class at the point where you're reading the paths of your train data folder. I assume os.listdir(train_data_folder).

Instead you could use a more efficient way os.scandir(train_data_folder) this returns a generator and calling next() on it will give you paths to your images within the train data. This way you can call next() as many times without changing the structure of your train data folder and build a subset of it.

Upvotes: 1

Related Questions