Ashish Rao
Ashish Rao

Reputation: 91

How to use GPU while training a model?

I am running a code to train a resnet model on a kaggle notebook. I have chosen the accelerator as GPU so I haven't made any mistakes there. I am training the model using the following code:

model.cuda()
for epoch in range(10):
  model.train(True)
  trainloss=0
  for x,y in trainloader:

    x,y=x.cuda(),y.cuda()

    yhat=model(x)
    optimizer.zero_grad()
    loss=criterion(yhat,y)
    loss.backward()
    optimizer.step()
    trainloss+=loss.item()

  print('Epoch {}  Loss: {}'.format(epoch,(trainloss/len(trainloader.dataset))))
  model.eval()
  testcorrect=0
  with torch.no_grad():
    for test_x,test_y in testloader:
      test_x,test_y=test_x.cuda(),test_y.cuda()
      yhat=model(test_x)
      _,z=yhat.max(1)
      testcorrect+=(test_y==z).sum().item()

print('Model Accuracy: ',(testcorrect/len(testloader.dataset)))

Network Code:

model=torchvision.models.resnet18(pretrained=True)

num_ftrs=model.fc.in_features
model.fc=nn.Sequential(nn.Linear(num_ftrs,1000),
                        nn.ReLU(),
                        nn.Linear(1000,2)
)

If you see I have used the .cuda() function on both my model as well as the tensors(inside the training part as well as validation part). However the GPU usage shown for the kaggle notebook is 0% while my CPU usage is up to 99%. Am I missing any code which is required to train the model using the GPU?

Upvotes: 1

Views: 2269

Answers (1)

Alexander Pivovarov
Alexander Pivovarov

Reputation: 4990

It might be that your model doesn't give GPU enough work. Try to make your network more GPU-hungry, e.g. introduce some linear layer with a bunch of neurons, etc. to double check that in that case you see increased GPU usage. Also I noticed that the measurement is delayed by a bit, so maybe you give GPU some work which it can do in a fraction of a second and the GPU usage bar doesn't have a chance to go higher from 0%.

Maybe you could share the actual network you're using?

I can see the GPU usage going to 100% in Kaggle notebook with a toy example like this (notice 2500 x 2500 linear layer here):

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

trainloader = [(torch.Tensor(np.random.randn(1000, 5)), torch.Tensor([1.0] * 1000))] * 1000

model = nn.Sequential(nn.Linear(5, 2500), nn.Linear(2500, 1500), nn.Linear(1500, 1))
model.cuda()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.)
criterion = lambda x,y : ((x-y)**2).mean()

for epoch in range(10):
  for x,y in trainloader:
    x,y=x.cuda(),y.cuda()
    yhat=model(x)
    optimizer.zero_grad()
    loss=criterion(yhat,y)
    loss.backward()
    optimizer.step()
  print(epoch)

Upvotes: 1

Related Questions