Reputation: 557
I had an issue with a much larger kernel, but it seems to distil down to the following code, from which the kernel never returns. Can someone please explain why there is an infinite loop?
__global__ void infinite_while_kernel(void)
{
int index = 0;
while (index >= threadIdx.x) {
index--;
}
return;
}
int main(void) {
infinite_while_kernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
In addition, the below kernel also gets stuck:
__global__ void not_infinite_while_kernel(void)
{
int index = 0;
while (index >= (unsigned int) 0u*threadIdx.x) {
index--;
}
return;
}
Replacing threadIdx.x
with 0
in the original kernel returns, as expected. I'm using the v5.5 toolkit, and compiling with the -arch=sm_20 -O0
flags. Running on a Tesla M2090. I do not currently have access to any other hardware, nor toolkit versions (it's not my system).
Upvotes: 1
Views: 1906
Reputation: 151879
This code hangs in ordinary C++ as well (try it):
int main(){
int index = 0;
while (index >= 0U){
index--;
}
return 0;
}
When comparing a signed to unsigned value, the compiler converts the signed value to unsigned.
threadIdx.x
is an unsigned value. An unmarked 0
constant in your code is not.
As an unsigned comparison, your test is always true, so the while loop never exits.
Also note that your __global__
function should be decorated with void
.
Finally, without a cudaDeviceSynchronize()
or other barrier in your code following the kernel launch, your program will exit "normally" anyway, even if the kernel hangs.
So I don't think the code you've posted actually reproduces the issue you're describing, but if you add the cudaDeviceSynchronize()
it will.
Upvotes: 4