Reputation: 13
I was trying to use the flag CL_MEM_USE_HOST_PTR
with the OpenCL function clCreateBuffer()
in order to avoid multiple memory allocation. After a little research (reverse engineering), I found that the framework calls the operating system allocation function no matter what flag I use.
Maybe my concept is wrong? But from documentation it's supposed to use DMA to access the host memory instead of allocating new memory.
I am using opencl 1.2 on an Intel device (HD5500)
Upvotes: 1
Views: 860
Reputation: 2796
On Intel GPUs ensure the allocated host pointer is page aligned and page length*. In fact I think the buffer size can actually be an even number of cache lines, but I always round up.
Use something like:
void *host_ptr = _aligned_malloc(align_to(size,4096),4096));
Here's a good article for this: In the "Key Takeaways".
If you already have the data and want to load the data into an OpenCL buffer object, then use
CL_MEM_USE_HOST_PTR
with a buffer allocated at a 4096 byte boundary (aligned to a page and cache line boundary) and a total size that is a multiple of 64 bytes (cache line size).
You can also use CL_MEM_ALLOC_HOST_PTR
and let the driver handle the allocation. But to get at the pointer you'll have to map and unmap it (but at no copy cost).
Upvotes: 2