How does clCreateBuffer use CL_MEM_*_HOST_PTR for kernel's output?

Question

I tried an implicit way to read OpenCL kernel's results from device to host:

input_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * n, input_data, &_err);
output_buffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * n, output_data, &_err);
clEnqueueNDRangeKernel(...);

In the code snippet above, I could get correct results from kernel in output_data. However, for the output_buffer creation, after I changed CL_MEM_USE_HOST_PTR into CL_MEM_COPY_HOST_PTR, the output_data stayed the same as if the kernel hadn't been executed. As I know, CL_MEM_USE_HOST_PTR works for the transmission from host to device in a way that it firstly copies the contents pointed by input_data into a different memory space, then input_buffer gets kernel's input from this new space. I doubt if the same thing happens for the output_buffer, and it copies kernel's results to somewhere else but doesn't move them into the space pointed by output_data.

Roman Arzumanyan · Accepted Answer

According to OpenCL standard:

CL_MEM_COPY_HOST_PTR indicates that the application wants the OpenCL implementation to copy the data from memory referenced by host_ptr. It doesn't indicate, on which side memory will be allocated.
CL_MEM_USE_HOST_PTR indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object. OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.

So, if you use CL_MEM_USE_HOST_PTR, data interconnection between Device & Host is transparent. No guarantee can be made, that any amount of memory on Device side will be allocated.

Taking back to your code snippet:

input_buffer = clCreateBuffer(...) will lead to allocation of sizeof(int) * n bytes on Device side.
output_buffer = clCreateBuffer(...) will not lead to memory allocation on Device side. Your OpenCL implementation may somehow cache content of Host-side memory, on which output_data points, but you will never now.

How does clCreateBuffer use CL_MEM_*_HOST_PTR for kernel's output?

Answers (2)

Related Questions

How does clCreateBuffer use CL_MEM_*_HOST_PTR for kernel&#39;s output?

Answers (2)

Related Questions

How does clCreateBuffer use CL_MEM_*_HOST_PTR for kernel's output?