Juypter Lab is running with GPU (claimed to be), but nvidia-smi said otherwise

Question

Here is the output of nvidia-smi when the GPU-intensive codes are running:

$ nvidia-smi
Mon Feb 13 10:20:42 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:81:00.0 Off |                  Off |
|  0%   47C    P8    28W / 450W |      8MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:C1:00.0 Off |                  Off |
|  0%   36C    P8    29W / 450W |      8MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1947      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      1947      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

it shows 0% utilization for both units and the script does not show up in the Processes list, which means that the GPU is idle.

The check on the running device is:

import torch

# show PyTorch version
print(torch.__version__)
# Check if CUDA is available
print('Is CUDA available?', torch.cuda.is_available())

And the output of the above is:

1.13.1+cu117
Is CUDA available? True

The concerned GPU codes are available here. Sorry for not pasting the whole chunk of codes here, as it is too long.

UPDATE: The concerned code is as follows:

class Net(nn.Module):
    device = torch.device("cuda") # I added this
    def __init__(self, n_vocab, embedding_dim, hidden_dim, dropout=0.2):
        super(Net, self).__init__()

        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim # dim = dimension
        
        embedding_dim.to(device) # I added this

        self.embeddings = nn.Embedding(n_vocab, embedding_dim)

        # LSTM Layer (input_size, hidden_size)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout)

        # Fully connected layer, change "Hidden State" Linear to output
        self.hidden2out = nn.Linear(hidden_dim, n_vocab)

    def forward(self, seq_in):
        seq_in.to(device) # I added this
        embeddings = self.embeddings(seq_in.t())

        lstm_out, _ = self.lstm(embeddings)
        ht = lstm_out[-1]

        out = self.hidden2out(ht)

        return out

The RuntimeError occurs at the line embeddings = self.embeddings(seq_in.t()).

The full RuntimeError is as follows:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

How can I modify the Net class in order to make it working again?

Juypter Lab is running with GPU (claimed to be), but nvidia-smi said otherwise

Answers (1)

Related Questions