ronhartleyone
ronhartleyone

Reputation: 73

Is it defined to write to the same buffer from different kernels?

I have OpenCL 1.1, one device, out of order execution command queue, and want that multiple kernels output their results into one buffer to different, not overlapped, arbitrary, regions. Is it possible?

cl::CommandQueue commandQueue(context, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE);

cl::Buffer buf_as(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, data_size, &as[0]);
cl::Buffer buf_bs(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, data_size, &bs[0]);

cl::Buffer buf_rs(context, CL_MEM_WRITE_ONLY, data_size, NULL);

cl::Kernel kernel(program, "dist");

kernel.setArg(0, buf_as);
kernel.setArg(1, buf_bs);

int const N = 4;
int const d = data_size / N;
std::vector<cl::Event> events(N);

for(int i = 0; i != N; ++i) {
    int const beg = d * i;
    int const len = d;

    kernel_leaf.setArg(2, beg);
    kernel_leaf.setArg(3, len);

    commandQueue.enqueueNDRangeKernel(kernel, NULL, cl::NDRange(block_size_x), cl::NDRange(block_size_x), NULL, &events[i]);
}

commandQueue.enqueueReadBuffer(buf_rs, CL_FALSE, 0, data_size, &rs[0], &events, NULL);

commandQueue.finish();

Upvotes: 0

Views: 601

Answers (3)

Hashman
Hashman

Reputation: 377

I don't think it is defined. Although you say you are writing to non-overlapping regions at the software level, it is not guaranteed that at the hardware level the accesses won't map onto same cache lines - in which case you'll have multiple modified versions flying around.

Upvotes: 0

Lee
Lee

Reputation: 930

I wanted to give an official committee response to this. We realise the specification is ambiguous and have made modifications to rectify this.

This is not guaranteed under OpenCL 1.x or indeed 2.0 rules. cl_mem objects are only guaranteed to be consistent at synchronization points, even when processed only on a single device and even when used by OpenCL 2.0 kernels using memory_scope_device.

Multiple child kernels of an OpenCL 2.0 parent kernel can share the parent's cl_mem objects at device scope.

Coarse-grained SVM objects can be shared at device scope between multiple kernels, as long as the memory locations written to are not overlapping.

Upvotes: 1

mfa
mfa

Reputation: 5087

The writes should work fine if the global memory addresses are non-overlapping as you have described. Just make sure both kernels are finished before reading the results back to the host.

Upvotes: 0

Related Questions