Sam
Sam

Reputation: 557

CUDA kernel infinite loop when comparing int to threadIdx.x

I had an issue with a much larger kernel, but it seems to distil down to the following code, from which the kernel never returns. Can someone please explain why there is an infinite loop?

__global__ void infinite_while_kernel(void)
{
    int index = 0;
    while (index >= threadIdx.x) {
        index--;
    }
    return;
}

int main(void) {
    infinite_while_kernel<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

In addition, the below kernel also gets stuck:

__global__ void not_infinite_while_kernel(void)
{
    int index = 0;
    while (index >= (unsigned int) 0u*threadIdx.x) {
        index--;
    }
return;
}

Replacing threadIdx.x with 0 in the original kernel returns, as expected. I'm using the v5.5 toolkit, and compiling with the -arch=sm_20 -O0 flags. Running on a Tesla M2090. I do not currently have access to any other hardware, nor toolkit versions (it's not my system).

Upvotes: 1

Views: 1906

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151879

This code hangs in ordinary C++ as well (try it):

int main(){

  int index = 0;
  while (index >= 0U){
    index--;
    }
  return 0;
}

When comparing a signed to unsigned value, the compiler converts the signed value to unsigned.

threadIdx.x is an unsigned value. An unmarked 0 constant in your code is not.

As an unsigned comparison, your test is always true, so the while loop never exits.

Also note that your __global__ function should be decorated with void.

Finally, without a cudaDeviceSynchronize() or other barrier in your code following the kernel launch, your program will exit "normally" anyway, even if the kernel hangs.

So I don't think the code you've posted actually reproduces the issue you're describing, but if you add the cudaDeviceSynchronize() it will.

Upvotes: 4

Related Questions