Reputation: 563
I need to run some experiments on custom datasets using pytorch. The question is, how can I create a dataset using torch.Dataloader?
I have two lists, one is called Values and has a datapoint tensor at every entry, and the other one is called Labels, that has the corresponding label. What I did is the following:
for i in range(samples):
dataset[i] = [values[i],labels[I]]
So I have a list with datapoint and respective label, and then tried the following:
dataset = torch.tensor(dataset).float()
dataset = torch.utils.data.TensorDataset(dataset)
data_loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=100, shuffle=True, num_workers=4, pin_memory=True)
But, first of all, I get the error "Not a sequence" in the torch.tensor command, and second, I'm not sure this is the right way of creating one. Any suggestion?
Thank you very much!
Upvotes: 0
Views: 3494
Reputation: 51
Just to enrich the answer by @shai
class MyDataset(Dataset):
def __init__(self, values):
super(MyDataset, self).__init__()
self.values = values
def __len__(self):
return len(self.values)
def __getitem__(self, index):
return self.values[index]
values = np.random.rand(51000, 3)
dataset = MyDataset(values)
Upvotes: 1
Reputation: 114866
You do not need to overload DataLoader
, but rather create a Dataset
for your data.
For instance,
class MyDataset(Dataset):
def __init__(self):
super(MyDataset, self).__init__()
# do stuff here?
self.values = values
self.labels = labels
def __len__(self):
return len(self.values) # number of samples in the dataset
def __getitem__(self, index):
return self.values[index], self.labels[index]
Upvotes: 3