troore
troore

Reputation: 797

How does clCreateBuffer use CL_MEM_*_HOST_PTR for kernel's output?

I tried an implicit way to read OpenCL kernel's results from device to host:

input_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(int) * n, input_data, &_err);
output_buffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(int) * n, output_data, &_err);
clEnqueueNDRangeKernel(...);

In the code snippet above, I could get correct results from kernel in output_data. However, for the output_buffer creation, after I changed CL_MEM_USE_HOST_PTR into CL_MEM_COPY_HOST_PTR, the output_data stayed the same as if the kernel hadn't been executed. As I know, CL_MEM_USE_HOST_PTR works for the transmission from host to device in a way that it firstly copies the contents pointed by input_data into a different memory space, then input_buffer gets kernel's input from this new space. I doubt if the same thing happens for the output_buffer, and it copies kernel's results to somewhere else but doesn't move them into the space pointed by output_data.

Upvotes: 1

Views: 1382

Answers (2)

Dithermaster
Dithermaster

Reputation: 6333

With OpenCL 1.x, after you run your kernel you must use either clEnqueueMapBuffer or clEnqueueReadBuffer to get the data back into host memory for further use by the CPU. There's no way around it.

Upvotes: 3

Roman Arzumanyan
Roman Arzumanyan

Reputation: 1814

According to OpenCL standard:

  • CL_MEM_COPY_HOST_PTR indicates that the application wants the OpenCL implementation to copy the data from memory referenced by host_ptr. It doesn't indicate, on which side memory will be allocated.
  • CL_MEM_USE_HOST_PTR indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object. OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.

So, if you use CL_MEM_USE_HOST_PTR, data interconnection between Device & Host is transparent. No guarantee can be made, that any amount of memory on Device side will be allocated.

Taking back to your code snippet:

  • input_buffer = clCreateBuffer(...) will lead to allocation of sizeof(int) * n bytes on Device side.
  • output_buffer = clCreateBuffer(...) will not lead to memory allocation on Device side. Your OpenCL implementation may somehow cache content of Host-side memory, on which output_data points, but you will never now.

Upvotes: 3

Related Questions