Khabbab Zakaria
Khabbab Zakaria

Reputation: 443

RuntimeError: CUDA error: no kernel image is available for execution on the device after model.cuda()

I am working on this model:

class Model(torch.nn.Module):
    def __init__(self, sizes, config):
        super(Model, self).__init__()

        self.lstm = []
        for i in range(len(sizes) - 2):
            self.lstm.append(LSTM(sizes[i], sizes[i+1], num_layers=8))
        self.lstm.append(torch.nn.Linear(sizes[-2], sizes[-1]).cuda())
        self.lstm = torch.nn.ModuleList(self.lstm)

        self.config_mel = config.mel_features

    def forward(self, x):
        # convert to log-domain
        x = x.clip(min=1e-6).log10()

        for layer in self.lstm[:-1]:
            x, _ = layer(x)
            x = torch.relu(x)

        #x = torch_unpack_seq(x)[0]

        x = self.lstm[-1](x)
        mask = torch.sigmoid(x)

        return mask

and then:

model = Model(model_width, config)
model.cuda()

But I am getting this error:

File "main.py", line 29, in <module>
    Model.train(args)
  File ".../src/model.py", line 57, in train
    model.cuda()
  File ".../.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 637, in cuda
    return self._apply(lambda t: t.cuda(device))
  File ".../.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/.../.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File ".../.local/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 189, in _apply
    self.flatten_parameters()
  File ".../.local/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 175, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I have no idea why it is happening. I am trying to push model and the inputs in cuda, and I understand if the error was due to some models in CPU and some in GPU. But that is not the case here. I found some pip install solution here: Pytorch CUDA error: no kernel image is available for execution on the device on RTX 3090 with cuda 11.1

but I cannot use it as I am trying to do the work in a remote repo where I don't have access to pip install.

Is there a way I can solve this?

Upvotes: 12

Views: 62433

Answers (3)

blue-zircon
blue-zircon

Reputation: 316

It can be that you're having older versions of torch and cuda. In that case, when you run torch.cuda.is_available() it would return True. However, if you say torch.tensor([0.12, 0.32]).cuda() it would give the mentioned error.

Even though I used the install command from pytorch website (https://pytorch.org/get-started/locally/) it had installed an older version. So, when you run the command add a -U to after pip install to upgrade. That solved the problem for me.

pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

instead of

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Source - https://qiita.com/Uenie/items/95107f79512d90f73a19

Upvotes: 2

Kevin Patel
Kevin Patel

Reputation: 674

I checked the latest torch and torchvision version with cuda from the given link. Stable versions list: https://download.pytorch.org/whl/cu113/torch_stable.html

Below versions solved the error,

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

Reference: #49161

Upvotes: 12

Khabbab Zakaria
Khabbab Zakaria

Reputation: 443

talonmies comment really helped:

The PyTorch installation you are trying to use doesn't have built-in binary support for the GPU you are trying to use. You will have to find (or make yourself) a build which has built in support. There is no work around here because of the design and packaging of PyTorch

The torch version was not compatible with the cuda version. I could check the issue in details with CUDA_LAUNCH_BLOCKING=1. I uninstalled the previous cuda version and installed the one I actually needed and now it's working

Upvotes: 3

Related Questions