Alfred
Alfred

Reputation: 563

How to create a completely (uniformly) random dataset on PyTorch

I need to run some experiments on custom datasets using pytorch. The question is, how can I create a dataset using torch.Dataloader?

I have two lists, one is called Values and has a datapoint tensor at every entry, and the other one is called Labels, that has the corresponding label. What I did is the following:

for i in range(samples):
dataset[i] = [values[i],labels[I]]

So I have a list with datapoint and respective label, and then tried the following:

dataset = torch.tensor(dataset).float()
dataset = torch.utils.data.TensorDataset(dataset)

data_loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=100, shuffle=True, num_workers=4, pin_memory=True)

But, first of all, I get the error "Not a sequence" in the torch.tensor command, and second, I'm not sure this is the right way of creating one. Any suggestion?

Thank you very much!

Upvotes: 0

Views: 3494

Answers (2)

George yang
George yang

Reputation: 51

Just to enrich the answer by @shai

class MyDataset(Dataset):
    def __init__(self, values):
        super(MyDataset, self).__init__()
        self.values = values

    def __len__(self):
        return len(self.values)

    def __getitem__(self, index):
        return self.values[index]

values = np.random.rand(51000, 3)

dataset = MyDataset(values)

Upvotes: 1

Shai
Shai

Reputation: 114866

You do not need to overload DataLoader, but rather create a Dataset for your data.
For instance,

class MyDataset(Dataset):
  def __init__(self):
    super(MyDataset, self).__init__()
    # do stuff here?
    self.values = values
    self.labels = labels

  def __len__(self):
    return len(self.values)  # number of samples in the dataset

  def __getitem__(self, index):
    return self.values[index], self.labels[index]

Upvotes: 3

Related Questions