Reputation: 53
I want to feed the multiclass image data-set in Pytorch, in the main folder of data-set I have 15 more folders with different names, I want to use folders names as the labels. For example, one folder name is Aeroplanes and contain the images (1245 images) other folder name is Cars and contains images of the Cars (997), likewise, each folder has different numbers of images. Now I want to load them to train my model and to test it, but I don't have separate folders for the training and testing. I want to use folder names as labels and also want to split the data-set into training and testing as an equal ratio. Your guidance, in this case, will be appreciated. Thanks
Upvotes: 0
Views: 989
Reputation: 9806
To split your dataset into train and test datasets you could use random_split
function:
import torch
from torchvision import datasets, transforms
from torch.utils import data
import numpy as np
dataset = datasets.ImageFolder('path_to_dataset', transform=transforms.ToTensor())
lengths = [int(np.ceil(0.5*len(dataset))),
int(np.floor(0.5*len(dataset)))]
train_set, test_set = data.random_split(dataset, lengths)
train_dataloader = data.DataLoader(train_set, batch_size=...)
test_dataloader = data.DataLoader(test_set, batch_size=...)
In case you want to perform separate transformations on your train and test datasets look here: How to use different data augmentation for Subsets in PyTorch
Upvotes: 2