BenedictWilkins
BenedictWilkins

Reputation: 1253

CUDA Illegal Memory Access error when using torch.cat

I was playing around with pytorch concatenate and wanted to see if I could use an output tensor that had a different device to the input tensors, here is the code:

import torch
a = torch.ones(4)
b = torch.ones(4) 
c = torch.zeros(8).cuda()
print(c)
ab = torch.cat([a,b], out=c)
print(c)

I am running this inside a jupyter notebook. pytorch version: 1.7.1

I get the following error:

...
\Anaconda3\envs\...\lib\site-packages\torch\_tensor_str.py in __init__(self, tensor)
     87 
     88         else:
---> 89             nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
     90 
     91             if nonzero_finite_vals.numel() == 0:

RuntimeError: CUDA error: an illegal memory access was encountered

It happens if you try to access the tensors c (in this case with a print).

I couldnt find anything in the documentation that said I couldn't do this, other than perhaps this line:

" ... any python sequence of tensors of the same type ... "

The error is kind of curious though... any ideas?

Upvotes: 1

Views: 3608

Answers (2)

PeterO
PeterO

Reputation: 43

I faced a similar issue and reproduced the error as above with minor differences:

# 080521 debug RuntimeError: CUDA error: an illegal memory access was encountered 
# https://stackoverflow.com/questions/66985008/cuda-illegal-memory-access-error-when-using-torch-cat

import torch
a = torch.ones(4)
b = torch.ones(4) 
c = torch.zeros(8).cuda()
print(c)
ab = torch.cat([a,b], out=c) # throws error below: 
print(c)

# RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 
#         (when checking arugment for argument tensors in method wrapper__cat_out_out)
#         i.e. 'expected object of backend CUDA but got CPU'

Applying this logic: Using CUDA with pytorch? (setting tensor type to cuda) solved the error:

import torch
torch.set_default_tensor_type('torch.cuda.FloatTensor')
a = torch.ones(4)
b = torch.ones(4) 
c = torch.zeros(8).cuda()
print(c)
ab = torch.cat([a,b], out=c)
print(c)

Upvotes: 0

trialNerror
trialNerror

Reputation: 3553

It appears that the behaviors changes according to the version of pytorch. With the version 1.3.0 I get the error expected object of backend CUDA but got CPU, but the version 1.5.0 I do indeed get the same error as you do. This would probably be worth mentioning on their github, because I believe the former error is more useful than the latter.

Anyway, both errors come from the fact that you concatenate cpu tensors into a GPU one. You can solve it very easily :

# Move the tensors to the GPU prior to concatenating
ab = torch.cat([a.cuda(),b.cuda()], out=c)

or

# Move the tensor after concatenating
c.copy_(torch.cat([a,b]).cuda())

I don't have a notebook but I believe you will have to restart your kernel, the error you get seems to break it down really bad. My python shell just cannot compute anything anymore after getting the illegal memory access.

Upvotes: 2

Related Questions