CUDA kernel infinite loop when comparing int to threadIdx.x

Question

I had an issue with a much larger kernel, but it seems to distil down to the following code, from which the kernel never returns. Can someone please explain why there is an infinite loop?

__global__ void infinite_while_kernel(void)
{
    int index = 0;
    while (index >= threadIdx.x) {
        index--;
    }
    return;
}

int main(void) {
    infinite_while_kernel<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

In addition, the below kernel also gets stuck:

__global__ void not_infinite_while_kernel(void)
{
    int index = 0;
    while (index >= (unsigned int) 0u*threadIdx.x) {
        index--;
    }
return;
}

Replacing threadIdx.x with 0 in the original kernel returns, as expected. I'm using the v5.5 toolkit, and compiling with the -arch=sm_20 -O0 flags. Running on a Tesla M2090. I do not currently have access to any other hardware, nor toolkit versions (it's not my system).

Robert Crovella · Accepted Answer

This code hangs in ordinary C++ as well (try it):

int main(){

  int index = 0;
  while (index >= 0U){
    index--;
    }
  return 0;
}

When comparing a signed to unsigned value, the compiler converts the signed value to unsigned.

threadIdx.x is an unsigned value. An unmarked 0 constant in your code is not.

As an unsigned comparison, your test is always true, so the while loop never exits.

Also note that your __global__ function should be decorated with void.

Finally, without a cudaDeviceSynchronize() or other barrier in your code following the kernel launch, your program will exit "normally" anyway, even if the kernel hangs.

So I don't think the code you've posted actually reproduces the issue you're describing, but if you add the cudaDeviceSynchronize() it will.

CUDA kernel infinite loop when comparing int to threadIdx.x

Answers (1)

Related Questions