Varun Balupuri
Varun Balupuri

Reputation: 363

PyTorch: Variable data has to be a tensor -- data is already as tenors

I am trying to load data using pytorch's Dataset and DataLoader classes. I use torch.from_numpyto convert each array to a tensor in the torch Dataset and from looking into the data, each X and y is indeed a tensor

# At this point dataset is {'X': numpy array of arrays, 'y': numpy array of arrays  } 

class TorchDataset(torch.utils.data.Dataset):
    def __init__(self, dataset):
        self.X_train = torch.from_numpy(dataset['X'])
        self.y_train = torch.from_numpy(dataset['y'])

    def __len__(self):
        return len(self.X_train)

    def __getitem__(self, index):
        return {'X': self.X_train[index], 'y': self.y_train[index]}

torch_dataset = TorchDataset(dataset)
dataloader = DataLoader(torch_dataset, batch_size=4, shuffle=True, num_workers=4)


for epoch in range(num_epochs):
    for X, y in enumerate(dataloader):
        features = Variable(X)
        labels = Variable(y)
        ....

However on features = Variable(X) i get:

RuntimeError: Variable data has to be a tensor, but got int

An example of an X and y in the dataset are:

In [1]: torch_dataset[1]
Out[1]: 
{'X': 
 -2.5908 -3.1123 -2.9460  ...  -3.9898 -4.0000 -3.9975
 -3.0867 -2.9992 -2.5254  ...  -4.0000 -4.0000 -4.0000
 -2.7665 -2.5318 -2.7035  ...  -4.0000 -4.0000 -4.0000
       ...             ⋱             ...          
 -2.4784 -2.6061 -1.6280  ...  -4.0000 -4.0000 -4.0000
 -2.2046 -2.1778 -1.5626  ...  -3.9597 -3.9366 -3.9497
 -1.9623 -1.9468 -1.5352  ...  -3.8485 -3.8474 -3.8474
 [torch.DoubleTensor of size 1024x1024], 'y': 
  107
 [torch.LongTensor of size 1]}

which is why it is very confusing for me that torch thinks X is an int. Any help would be much appreciated - thanks!

Upvotes: 0

Views: 4232

Answers (2)

jdhao
jdhao

Reputation: 28349

There is error in your use of enumerate which caused the error because the first return value of enumerate is the batch index, not the actual data. There are two ways you can make your script work.

First way

Since your X and y is do not need special process. You can just return a sample of X and y. Change your __getitem__ method to

 def __getitem__(self, index):
        return self.X_train[index], self.y_train[index]

Also, change your training loop a little bit:

for epoch in range(num_epochs):    
    for batch_id, (x, y) in enumerate(dataloader):
           x = Variable(x)
           y = Variable(y)
           # then do whatever you want to do

Second way

You can return a dict in the __getitem__ method and extract the actual data in the training loop. In this case, you do not need to change the __getitem__ method. Just change your training loop:

for epoch in range(num_epochs):    
    for batch_id, data in enumerate(dataloader):
        # data will be dict 
        x = Variable(data['X'])
        y = Variable(data['y'])
        # then do whatever you want to do

Upvotes: 2

p13rr0m
p13rr0m

Reputation: 1297

Notice that you are using enumerate in the for-loop. So, what you are doing is the following

for batch_index, batch in enumerate(dataloader):

Upvotes: 1

Related Questions