Reputation: 41
I have a problem in OpenCL 1.2.
Look, i have an array as __global
in the kernel and the group size is 1000.
The problem is that the atomic_add()
function doesn't work correctly.
My kernel code is :
__kernel void kernelfunction(__global uint32_t* buffer){
buffer[3] = 100;
atomic_add(&buffer[3], 1);
...
}
If I create 1000 threads, I expect the value of buffer[3]
will be 1100, am i right?
but the behavior of the program is undefined.
sometime it will be 1100, sometimes 1064, sometimes 1093 and ...
What I have tried:
I also enable the opencl extension like below :
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
but the problem is still exists.
in the another project I'd create a simple opencl project and the atomic_add
works correctly, and I've checked almost the entire of project configuration but i don't know the problem where is.
can you help me? thanks
Upvotes: 3
Views: 2399
Reputation: 11918
Without atomics, a simple access has a race condition when done onto a same element and even worse, all data could have been cached per compute unit and not be updated until kernel ends.
buffer[3] = 100;
this is undefined behaviour. The result could even be 101;
Even threads in same local group can't have true data without a synchronization command.
Initialization should be made by host because gpu runs threads concurrently. Not serially, excluding atomics. Or, you initialize for its own group (not visible from other groups) and add a barrier(CLK_GLOBAL_MEM_FENCE)
after that so other threads in same group can see it right.
Upvotes: 7