Mahdi Nejadsahebi
Mahdi Nejadsahebi

Reputation: 41

Opencl atomic_add() function returns wrong value

I have a problem in OpenCL 1.2. Look, i have an array as __global in the kernel and the group size is 1000. The problem is that the atomic_add() function doesn't work correctly.

My kernel code is :

__kernel void kernelfunction(__global uint32_t* buffer){

buffer[3] = 100;

atomic_add(&buffer[3], 1);

...

}

If I create 1000 threads, I expect the value of buffer[3] will be 1100, am i right? but the behavior of the program is undefined. sometime it will be 1100, sometimes 1064, sometimes 1093 and ...

What I have tried:

I also enable the opencl extension like below :

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable

but the problem is still exists. in the another project I'd create a simple opencl project and the atomic_add works correctly, and I've checked almost the entire of project configuration but i don't know the problem where is.

can you help me? thanks

Upvotes: 3

Views: 2399

Answers (1)

huseyin tugrul buyukisik
huseyin tugrul buyukisik

Reputation: 11918

Without atomics, a simple access has a race condition when done onto a same element and even worse, all data could have been cached per compute unit and not be updated until kernel ends.

buffer[3] = 100;

this is undefined behaviour. The result could even be 101;

Even threads in same local group can't have true data without a synchronization command.

Initialization should be made by host because gpu runs threads concurrently. Not serially, excluding atomics. Or, you initialize for its own group (not visible from other groups) and add a barrier(CLK_GLOBAL_MEM_FENCE) after that so other threads in same group can see it right.

Upvotes: 7

Related Questions