IDoNot1xist
IDoNot1xist

Reputation: 39

Torch allocates zero GPU memory on PyTorch

I am trying to use GPU to train my model but it seems that torch fails to allocate GPU memory.

My model is a RNN built on PyTorch

device = torch.device('cuda: 0' if torch.cuda.is_available() else "cpu")

rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)
criterion = nn.NLLLoss()
criterion.to(device)
optimizer = torch.optim.SGD(rnn.parameters(), lr=learning_rate, weight_decay=.9)
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
        self.i2o = nn.Linear(input_size + hidden_size, output_size)

        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        input = input.cuda()
        hidden = hidden.cuda()

        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)

        output = output.cuda()
        hidden = hidden.cuda()

        return output, hidden

    def init_hidden(self):
        return Variable(torch.zeros(1, self.hidden_size).cuda())

Training function:

def train(category_tensor, line_tensor, rnn, optimizer, criterion):
    rnn.zero_grad()
    hidden = rnn.init_hidden()

    for i in range(line_tensor.size()[0]):
        output, hidden = rnn(line_tensor[i], hidden)

    loss = criterion(output, category_tensor)
    loss.backward()

    optimizer.step()

    return output, loss.item()

The function to get category_tensor and line_tensor:

def random_training_pair(category_lines, n_letters, all_letters):
    category = random.choice(all_categories_train)
    line = random.choice(category_lines[category])
    category_tensor = Variable(torch.LongTensor([all_categories_train.index(category)]).cuda())
    line_tensor = Variable(process_data.line_to_tensor(line, n_letters, all_letters)).cuda()

    return category, line, category_tensor, line_tensor

I ran the following the code:

 print(torch.cuda.get_device_name(0))
 print('Memory Usage:')
 print('Allocated:', round(torch.cuda.memory_allocated(0) / 1024 ** 3, 1), 'GB')
 print('Cached:   ', round(torch.cuda.memory_cached(0) / 1024 ** 3, 1), 'GB')

and I got:

GeForce GTX 1060
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB

I did not get any errors but GPU usage is just 1% while CPU usage is around 31%.

I am using Windows 10 and Anaconda, where my PyTorch is installed. CUDA and cuDNN is installed from .exe file downloaded from Nvidia website.

Upvotes: 3

Views: 3684

Answers (2)

Hao-Ting Li
Hao-Ting Li

Reputation: 121

The issue is caused by that the CUDA version of PyTorch is not installed correctly. If the CUDA version is installed, then the following statement

device = torch.device('cuda: 0' if torch.cuda.is_available() else "cpu")

will raise RuntimeError:

RuntimeError: Invalid device string: 'cuda: 0'

Because the correct usage is cuda:0 without a space.

You should check the version first. For example, type conda list as follows:

$ conda list

# packages in environment at /home/maniac/.conda/envs/torch:
#
# Name                    Version                   Build  Channel
...
torch                     2.0.0+cu118              pypi_0    pypi
...

+cu118 shows that the CUDA version of PyTorch is correctly installed. If the version shows 2.0.0+cpu, then PyTorch runs with CPU.

Upvotes: 0

MBT
MBT

Reputation: 24099

Your problem is that to() is not an in-place operation. If you call rnn.to(device) it will return a new object / model located on the desired device. But it will not move the old object anywhere!

So changing:

rnn = RNN(n_letters, n_hidden, n_categories_train)
rnn.to(device)

to:

rnn = RNN(n_letters, n_hidden, n_categories_train).to(device)

For all other instances you used to this way, you have to change it as well.

Should do the trick for you!

Note: All tensors and parameters you perform operations with have to be on the same device. If your model is on GPU but your input tensor is on CPU you will get an error message.

Upvotes: 4

Related Questions