Sarthak Mittal
Sarthak Mittal

Reputation: 155

cuDNN error: CUDNN_STATUS_BAD_PARAM.Can someone explain why i am getting this error and how can i correct it?

I am trying to implement a Character LSTM using Pytorch.But I am getting cudnn_status_bad_params errors.This is the training loop.I getting error on line output = model(input_seq).

for epoch in tqdm(range(epochs)):
  for i in range(len(seq)//batch_size):
   sidx = i*batch_size
   eidx = sidx + batch_size
   x = seq[sidx:eidx]
   x = torch.tensor(x).cuda()
   input_seq =torch.nn.utils.rnn.pack_padded_sequence(x,seq_lengths,batch_first = True)
   y = out_seq[sidx:eidx]
   output = model(input_seq)
   loss = criterion(output,y)
   loss.backward()
   optimizer.step()
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)   
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    180         else:
    181             result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias,
--> 182                            self.num_layers, self.dropout, self.training, self.bidirectional)
    183         output = result[0]
    184         hidden = result[1:] if self.mode == 'LSTM' else result[1]

 RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM

Upvotes: 5

Views: 24459

Answers (4)

I would have commented on Prateek's answer, but I can't, so I'm adding this here for future generations:

I ran the model on the CPU, and my error was upgraded to another half-helpful error which I couldn't find on the web a solution to:

RuntimeError: could not create a descriptor for a dilated convolution forward propagation primitive

For me it was a conv layer mistakenly defined with dilation=0 instead of 1. So as per the original error (CUDNN_STATUS_BAD_PARAM), make sure the parameters for the error-prone layer are valid.

Upvotes: 1

m.oghbaie
m.oghbaie

Reputation: 11

I had the same issue and the problem was with torch==1.6. The solution can be found here git issue. Take a look. It may be your solution as well.

Upvotes: 0

BruceLi
BruceLi

Reputation: 11

I encounter the same error. Here's the solution.

You should change the type of input from float64 to float32, which means you should type:

input_seq = input_seq.float()

Upvotes: -3

Prateek Rawat
Prateek Rawat

Reputation: 89

I got the same error, if you switch to CPU, you'll get a much better description of the error. In my case the problem was in type of input that I was giving to the network. I was sending I guess long, while the model needed float. I made the following changes and the code worked. Basically switching to cpu gives better error descriptions.

input_seq = input_seq.float().cuda()

Upvotes: 8

Related Questions