Reputation: 11
When I write CUDA code,I use atomic Operation to force a global sychronization at the last step.
Then I also have to implemente the same task in OpenCL, I wonder is there is a similar operation in OpenCL like atomic operation in CUDA that I can use, my devices is a fpga board..
Upvotes: 1
Views: 409
Reputation: 965
According to your comment, it seems like you want atomic operations on float values.
Please check out this link: atomic operation and floats in opencl
The idea is to use the built in atom_cmpxchg
operation to try to swap the old value of a float point variable with a new value, which could be be its addition with another value, or multiplication, division, subtraction, etc.
The swapping only succeeds if the old value is actually the old value (that's where the cmp
comes into play). Otherwise, it will do it again in a while loop.
Notice that this atomic operation could be quite slow if many threads are doing this operation on a single value.
Upvotes: 0
Reputation: 6343
There is no kernel-level global synchronization is OpenCL and CUDA since entire workgroups may finish before others can be started. Only workgroup level synchronization is available inside a kernel. For global synchronization you much use multiple kernels.
Upvotes: 1
Reputation:
barrier() may be something similar to what you are looking for, but can only force a "join" on threads in the same workgroup.
See this post. You may be able to use CLK_GLOBAL_MEM_FENCE to get the results you are looking for.
Stack overflow: Barriers in OpenCL
Upvotes: 3