Reputation: 39
I am trying to use GPU to train my model but it seems that torch fails to allocate GPU memory.
My model is a RNN built on PyTorch
device = torch.device('cuda: 0' if torch.cuda.is_available() else "cpu")
rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)
criterion = nn.NLLLoss()
criterion.to(device)
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate, weight_decay=.9)
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
input = input.cuda()
hidden = hidden.cuda()
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
output = output.cuda()
hidden = hidden.cuda()
return output, hidden
def init_hidden(self):
return Variable(torch.zeros(1, self.hidden_size).cuda())
Training function:
def train(category_tensor, line_tensor, rnn, optimizer, criterion):
rnn.zero_grad()
hidden = rnn.init_hidden()
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
loss = criterion(output, category_tensor)
loss.backward()
optimizer.step()
return output, loss.item()
The function to get category_tensor and line_tensor:
def random_training_pair(category_lines, n_letters, all_letters):
category = random.choice(all_categories_train)
line = random.choice(category_lines[category])
category_tensor = Variable(torch.LongTensor([all_categories_train.index(category)]).cuda())
line_tensor = Variable(process_data.line_to_tensor(line, n_letters, all_letters)).cuda()
return category, line, category_tensor, line_tensor
I ran the following the code:
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), 'GB')
print('Cached: ', round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), 'GB')
and I got:
GeForce GTX 1060
Memory Usage:
Allocated: 0.0 GB
Cached: 0.0 GB
I did not get any errors but GPU usage is just 1% while CPU usage is around 31%.
I am using Windows 10 and Anaconda, where my PyTorch is installed. CUDA and cuDNN is installed from .exe file downloaded from Nvidia website.
Upvotes: 3
Views: 3684
Reputation: 121
The issue is caused by that the CUDA version of PyTorch is not installed correctly. If the CUDA version is installed, then the following statement
device = torch.device('cuda: 0' if torch.cuda.is_available() else "cpu")
will raise RuntimeError
:
RuntimeError: Invalid device string: 'cuda: 0'
Because the correct usage is cuda:0
without a space.
You should check the version first. For example, type conda list
as follows:
$ conda list
# packages in environment at /home/maniac/.conda/envs/torch:
#
# Name Version Build Channel
...
torch 2.0.0+cu118 pypi_0 pypi
...
+cu118
shows that the CUDA version of PyTorch is correctly installed. If the version shows 2.0.0+cpu
, then PyTorch runs with CPU.
Upvotes: 0
Reputation: 24099
Your problem is that to()
is not an in-place operation. If you call rnn.to(device)
it will return a new object / model located on the desired device. But it will not move the old object anywhere!
So changing:
rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)
to:
rnn = RNN(n_letters, n_hidden, n_categories_train).to(device)
For all other instances you used to
this way, you have to change it as well.
Should do the trick for you!
Note: All tensors and parameters you perform operations with have to be on the same device. If your model is on GPU but your input tensor is on CPU you will get an error message.
Upvotes: 4