dagisky
dagisky

Reputation: 79

RuntimeError: CUDA error: device-side assert triggered on loss function

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [18,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.

Traceback (most recent call last):
File "run_toys.py", line 215, in
loss = criterion(torch.reshape(out, [-1, dataset.out_dim]), torch.reshape(target, [-1, dataset.out_dim]))
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 530, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/functional.py", line 2526, in
binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: CUDA error: device-side assert triggered

The Code

criterion = nn.CrossEntropyLoss()
loss = criterion(torch.reshape(out, [-1, dataset.out_dim]), torch.reshape(target, [-1, dataset.out_dim]))
loss = torch.mean(loss)

The shape of the target and output is the same # torch.Size([640, 32])

The model runs on my CPU OK, but running on GPU is the issue

Upvotes: 4

Views: 10619

Answers (1)

Sherzodbek
Sherzodbek

Reputation: 316

There might be two reasons of the error:

  1. As the log says input_val is not between the range [0; 1]. So you should ensure that model outputs are in that range. You can use torch.clamp() of pytorch. Before calculating the loss add the following line:
    out = out.clamp(0, 1)
  1. Maybe you are sure that model outputs are in the range [0; 1]. Then very common problem is output contains some nan values which triggers assert as well. To prevent this you can use the following trick, again before calculating the loss:
    out[out!=out] = 0 # or 1 depending on your model's need

Here the trick is using nan!=nan property, we should change them to some valid number.

Upvotes: 7

Related Questions