Reputation: 2549
I have a kernel which has 3 phases. Each phase have to complete before the execution jumps to the next phase. I'm not sure how to do it, or if CLK_GLOBAL_MEM_FENCE
is used for this. (I'm getting pretty strange results on an water erosion kernel).
A pseudocode illustrating my problem:
void kernel krnl(__global float* data
__global float* avg)
{
int2 pos = (int2)(get_global_id(0), get_global_id(1));
pos.x = max(pos.x, 1);
pos.y = max(pos.y, 1);
data[pos.x + get_global_size(0)*pos.y] = (pos.y+pos.x)/2.0f; //just some random stuff here
//globalSync();
avg[pos.x + get_global_size(0)*pos.y] = data[pos.x + 1 + get_global_size(0)*pos.y];
avg[pos.x + get_global_size(0)*pos.y] += data[pos.x - 1 + get_global_size(0)*pos.y];
avg[pos.x + get_global_size(0)*pos.y] += data[pos.x + get_global_size(0)*(pos.y-1)];
avg[pos.x + get_global_size(0)*pos.y] += data[pos.x + get_global_size(0)*(pos.y+1)];
avg[pos.x + get_global_size(0)*pos.y]/=4.0f;
};
First it should fill the buffer with "random" numbers, then mixing a value with its neighbours.
So, what are the possibilities for doing this kind of sync? Is it possible to do it in one kernel, with only the buffers necessary, or I have to add in
and out
buffers, not only read_write? Or is it a better idea to create multiple kernels, and shared buffers?
Upvotes: 1
Views: 147
Reputation: 9925
OpenCL does not provide any method of synchronising global memory across all work-groups executing a kernel. This is because OpenCL is designed to run on a large number of different devices, and not all of these devices can guarantee that all work-groups in an arbitrarily large kernel invocation will run concurrently (and make independent forward progress with respect to each other).
So, you'll need to use multiple kernels to implement this sort of thing. Alternatively, you might consider whether you can implement your algorithm such that you only need to synchronise memory within a given work-group, which you can do with the barrier
function.
Upvotes: 4