Raptor
Raptor

Reputation: 54268

Juypter Lab is running with GPU (claimed to be), but nvidia-smi said otherwise

Here is the output of nvidia-smi when the GPU-intensive codes are running:

$ nvidia-smi
Mon Feb 13 10:20:42 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:81:00.0 Off |                  Off |
|  0%   47C    P8    28W / 450W |      8MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:C1:00.0 Off |                  Off |
|  0%   36C    P8    29W / 450W |      8MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1947      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      1947      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

it shows 0% utilization for both units and the script does not show up in the Processes list, which means that the GPU is idle.

The check on the running device is:

import torch

# show PyTorch version
print(torch.__version__)
# Check if CUDA is available
print('Is CUDA available?', torch.cuda.is_available())

And the output of the above is:

1.13.1+cu117
Is CUDA available? True

The concerned GPU codes are available here. Sorry for not pasting the whole chunk of codes here, as it is too long.


UPDATE: The concerned code is as follows:

class Net(nn.Module):
    device = torch.device("cuda") # I added this
    def __init__(self, n_vocab, embedding_dim, hidden_dim, dropout=0.2):
        super(Net, self).__init__()

        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim # dim = dimension
        
        embedding_dim.to(device) # I added this

        self.embeddings = nn.Embedding(n_vocab, embedding_dim)

        # LSTM Layer (input_size, hidden_size)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout)

        # Fully connected layer, change "Hidden State" Linear to output
        self.hidden2out = nn.Linear(hidden_dim, n_vocab)

    def forward(self, seq_in):
        seq_in.to(device) # I added this
        embeddings = self.embeddings(seq_in.t())

        lstm_out, _ = self.lstm(embeddings)
        ht = lstm_out[-1]

        out = self.hidden2out(ht)

        return out

The RuntimeError occurs at the line embeddings = self.embeddings(seq_in.t()).

The full RuntimeError is as follows:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

How can I modify the Net class in order to make it working again?

Upvotes: 1

Views: 300

Answers (1)

Felix Zimmermann
Felix Zimmermann

Reputation: 392

General Information: In pytorch, each tensor can be on one of the different "devices". Most operations preformed on a tensor will be done using the compute capabilities of the associated device. If an operation takes multiple inputs, in most cases all tensor inputs have to be on the same device

So, for your model to use a GPU, not only do you need to have a GPU available, but also all data has to be explicitly moved to the GPU.

If you use a framework such as pytorch lightning this might be partially done automatically.

Otherwise, the basic recipe is:

model=Network() # create instance of your model
model=model.to(device='cuda') #move model parameters to gpu
for batch in dataloader:
   x,y,_* = batch # unpack data and labels of the batch
   x = x.to(device='cuda') # move data to GPU
   y = y.to(device='cuda') # move labels to GPU

   prediction = model(x) #apply model
   loss=lossfunction(x,y)
   ....

So, a reason why code might not be using the GPU without any error messages is because the data is never moved to a GPU!

You can check where the tensor x resides it by printing x.device

Edit: As a rule of thumb, I would advice against moving tensors around inside the forward or init function of your Network if you can avoid it and instead:

  1. Move the whole instance of your network to the device
  2. Move all data inside your training loop to the device

Upvotes: 2

Related Questions