Reputation: 934
I have a test kernel here that I'm calling from the Julia OpenCL API. The fact that I'm calling it from Julia isn't important, it's just what I'm using to run OpenCL, here is the code:
using OpenCL
const cl = OpenCL
device, ctx, queue = cl.create_compute_context()
C_buff = cl.Buffer(Float32, ctx, :w, 2)
const testkernel = """
kernel void test(global float *C)
{
int gid = get_global_id(0);
int lid = get_local_id(0);
local float x;
if (lid == 0)
{
x = 0.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
x += 1.0f;
barrier(CLK_LOCAL_MEM_FENCE);
if (lid == 0)
{
C[gid / 2] = x;
}
}
"""
program = cl.Program(ctx, source=testkernel) |> cl.build!
kernel = cl.Kernel(program, "test")
cl.call(queue, kernel, 4, 2, C_buff)
cl.read(queue, C_buff)
What I can't figure out is this returns a vector [1.0,1.0]
, when it seems like it should return the vector [2.0,2.0]
. Since basically I have 4 work-items broken up into two work-groups (each containing 2 work-items).
One local float x
is instantiated for each work-group and the first work-item in each work-group sets it to 0. Then each work-item in the work-group adds 1 to it, and since there's two work-items in each work-group it should be 2, but when I return C
, I get a vector of ones instead.
Upvotes: 1
Views: 109
Reputation: 445
barrier(CLK_LOCAL_MEM_FENCE); x += 1.0f; barrier(CLK_LOCAL_MEM_FENCE);
Barriers aren't mutexes. You have a data race where both of the work items try to write to the same variable at the same time.
You'll have to use atomics or redesign your code.
Upvotes: 3