Reputation: 123
I'm trying to create my own Dataloader from a custom dataset for a CNN. The original Dataloader was created by writing:
train_loader = torch.utils.data.DataLoader(mnist_data, batch_size=64)
If I check the shape of the above, I get
i1, l1 = next(iter(train_loader))
print(i1.shape) # torch.Size([64, 1, 28, 28])
print(l1.shape) # torch.Size([64])
When I feed this train_loader into my CNN, it works beautifully. However, I have a custom dataset. I have done the following:
mnist_data = datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor())
trainset = mnist_data
testset = mnist_data
x_train = np.array(trainset.data)
y_train = np.array(trainset.targets)
# modify x_train/y_train
Now, how would I be able to take x_train, y_train and make it into a Dataloader similar to the first one? I have done the following:
train_data = []
for i in range(len(x_train)):
train_data.append([x_train[i], y_train[i]])
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64)
for i, (images, labels) in enumerate(train_loader):
images = images.unsqueeze(1)
However, I'm still missing the channel column (which should be 1). How would I fix this?
Upvotes: 1
Views: 1341
Reputation: 2822
I don't have access to your x_train
and y_train
, but probably this works:
from torch.utils.data import TensorDataset, DataLoader
# use x_train and y_train as numpy array without further modification
x_train = np.array(trainset.data)
y_train = np.array(trainset.targets)
# convert to numpys to tensor
tensor_x = torch.Tensor(x_train)
tensor_y = torch.Tensor(y_train)
# create the dataset
custom_dataset = TensorDataset(tensor_x,tensor_y)
# create your dataloader
my_dataloader = DataLoader(custom_dataset,batch_size=1)
#check if you can get the desired things
i1, l1 = next(iter(my_dataloader))
print(i1.shape) # torch.Size([1, 1, 28, 28])
print(l1.shape) # torch.Size([1])
Upvotes: 2