Reputation: 2722
What is the meaning of the parameter host_ptr
in clCreateBuffer
?
cl_mem clCreateBuffer ( cl_context context,
cl_mem_flags flags,
size_t size,
void *host_ptr,
cl_int *errcode_ret )
It's not really clear from the documentation:
host_ptr : A pointer to the buffer data that may already be allocated by the application. The size of the buffer that host_ptr points to must be greater than or equal to the size bytes.
To me it might sound like a host buffer that is going to be copied in the device buffer. However In many examples I see this action is actually performed by clEnqueueWriteBuffer
, and nothing is actually passed to clCreateBuffer
as host_ptr
parameter.
Could you clarify?
Upvotes: 1
Views: 1323
Reputation: 418
There are a couple different uses for the host_ptr parameter, depending on the value passed into the flags parameter. The two flags that use it are CL_MEM_USE_HOST_PTR
& CL_MEM_COPY_HOST_PTR
. If flags doesn't contain either of these and host_ptr isn't NULL, the function will give a CL_INVALID_HOST_PTR
error (see errors on bottom of clCreateBuffer page).
This basically does as you described, it copies the host_ptr to the devices memory. This provides a convenience by effectively combining clCreateBuffer
& clEnqueueWriteBuffer
. Notice that you don't need a command queue as required by clEnqueueWriteBuffer
, the data will be copied to all devices on the context before you use them, how convenient!
As the name suggests, this is a COPY. If you write to this memory on the device, the memory in host_ptr will never see these writes (unless you perform a clEnqueueReadBuffer). Similarly, if you write to host_ptr after creating the buffer, the memory on the device will not see these writes (unless you perform a clEnqueueWriteBuffer).
Instead of creating a buffer in the device memory, OpenCL will use the memory pointed to by host_ptr directly. If your device is a dedicated GPU on a PCIe connection, this means every time you read or write to this memory in device code, the data has to be sent over PCIe (slow). OpenCL allows the device to create a cache of the data during kernel execution, which helps somewhat. If your device shares physical memory with the host like an integrated GPU, there shouldn't be any such overhead.
Note though you have to be careful to ensure memory consistency. If the host modifies host_ptr while a kernel is working on the memory associated with it (or vice versa), you get undefined behavior.
It depends on what you are doing and what hardware you are working on.
If you aren't too worried about memory copying performance, and want an easy & safe method, use CL_MEM_COPY_HOST_PTR
.
For large arrays, generally I use CL_MEM_COPY_HOST_PTR
and enqueue a read / write as needed. If I'm just writing to an int or small structure, I would use CL_MEM_USE_HOST_PTR
, just be careful about memory consistency.
If you are really interested in memory performance, you may have to look into techniques such as pinned memory (Reference for AMD GPUs).
If you are confident that the host and device(s) will not read while another is writing, you will be fine using CL_MEM_USE_HOST_PTR
for any size memory objects. For maximum performance, you may have to make sure host_ptr is aligned and a multiple of a certain size (for Intel devices, aligned on a 4 KB boundary and a multiple of 64 bytes). Here is more info for zero-copy on Intel hardware.
Upvotes: 4