Reputation: 21
I am trying to train a network via pytorch on CUDA enabled GeForce GTX 1070 gpu. I don't understand the error nor have I found any similar problem anywhere. I don't know if its cuda's issue or something in my code.
Traceback (most recent call last):
File "main.py", line 497, in <module>
main()
File "main.py", line 167, in main
train(train_loader, model, criterion, optimizer, epoch, normalizer)
File "main.py", line 244, in train
output = model(*input_var)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\1546544\Desktop\ML\model.py", line 147, in forward
atom_fea = conv_func(atom_fea, nbr_fea, nbr_fea_idx)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:\Users\1546544\Desktop\ML\model.py", line 66, in forward
total_gated_fea = self.fc_full(total_nbr_fea)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\functional.py", line 837, in linear
output = input.matmul(weight.t())
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\variable.py", line 386, in matmul
return torch.matmul(self, other)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\functional.py", line 192, in matmul
output = torch.mm(tensor1, tensor2)
RuntimeError: cublas runtime error : the GPU program failed to execute at C:/Anaconda2/conda-bld/pytorch_1519496000060/work/torch/lib/THC/THCBlas.cu:247
Upvotes: 2
Views: 2866
Reputation: 3205
I faced the same problem.
I fixed this problem by dataset label correction.
I mean, training label was incorrect for my dataset. That's why it was failed during backward()
pass.
So, checking the expected label after loading it from disk/database might be helpful.
Upvotes: 1