JohnJ
JohnJ

Reputation: 7056

concat datasets in pytorch

I have a few datasets in folders, I am concating them using concat datasets. So, I have data folders like so (note that folders 1 and 2 only have 1 class rather than 2):

- denotes subfolders

folder0
-cats
-dogs

folder1
-cats

folder2
-cats

folder3
-dogs

and then I do this:

    trainset1 = datasets.ImageFolder(folder0, loader=my_loader, transform=SomeAug())    
    trainset2 = datasets.ImageFolder(folder1, loader=my_loader, transform=SomeAug())    
    trainset3 = datasets.ImageFolder(folder2, loader=my_loader, transform=SomeAug())    
    trainset = torch.utils.data.ConcatDataset([trainset1, trainset2, trainset3])

Is this the legit way of doing this? When I look at the total images via:

len(train_loader.dataset))

it adds up correctly.

However, when I do:

print(trainset.classes)

it throws me:

AttributeError: 'ConcatDataset' object has no attribute 'classes'

which it does not when I use just one dataset.

I just wanted to ensure that there no gotchas in using thie concat dataset method.

Upvotes: 2

Views: 2650

Answers (1)

DerekG
DerekG

Reputation: 3958

ImageFolder inherits from DatasetFolder which has a class method find_classes that is called in the constructor to initialize the variable DatasetFolder.classes. Thus, you can call trainset.classes without error.

However, ConcatDataset does not inherit from ImageFolder and more generally does not implement the classes variable by default. In general, it would be difficult to do this because the ImageFolder method for finding classes relies on a specific file structure, whereas ConcatDataset doesn't assume such a file structure such that it can work with a more general set of datasets.

If this functionality is essential to you you could write a simple dataset type that inherits from ConcatDataset, expects ImageFolder datasets specifically, and stores the classes as a union of the possible classes from each constituent dataset.

Upvotes: 2

Related Questions