Reputation: 797
I tried an implicit way to read OpenCL kernel's results from device to host:
input_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * n, input_data, &_err);
output_buffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * n, output_data, &_err);
clEnqueueNDRangeKernel(...);
In the code snippet above, I could get correct results from kernel in output_data
. However, for the output_buffer
creation, after I changed CL_MEM_USE_HOST_PTR
into CL_MEM_COPY_HOST_PTR
, the output_data
stayed the same as if the kernel hadn't been executed. As I know, CL_MEM_USE_HOST_PTR
works for the transmission from host to device in a way that it firstly copies the contents pointed by input_data
into a different memory space, then input_buffer
gets kernel's input from this new space. I doubt if the same thing happens for the output_buffer
, and it copies kernel's results to somewhere else but doesn't move them into the space pointed by output_data
.
Upvotes: 1
Views: 1382
Reputation: 6333
With OpenCL 1.x, after you run your kernel you must use either clEnqueueMapBuffer
or clEnqueueReadBuffer
to get the data back into host memory for further use by the CPU. There's no way around it.
Upvotes: 3
Reputation: 1814
According to OpenCL standard:
CL_MEM_COPY_HOST_PTR
indicates that the application wants the OpenCL implementation to copy the data from memory referenced by host_ptr
. It doesn't indicate, on which side memory will be allocated. CL_MEM_USE_HOST_PTR
indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr
as the storage bits for the memory object. OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr
in device memory. This cached copy can be used when kernels are executed on a device.So, if you use CL_MEM_USE_HOST_PTR
, data interconnection between Device & Host is transparent. No guarantee can be made, that any amount of memory on Device side will be allocated.
Taking back to your code snippet:
input_buffer = clCreateBuffer(...)
will lead to allocation of sizeof(int) * n
bytes on Device side.output_buffer = clCreateBuffer(...)
will not lead to memory allocation on Device side. Your OpenCL implementation may somehow cache content of Host-side memory, on which output_data
points, but you will never now.Upvotes: 3