Reputation: 1949
First, I am having a hard time figuring out how clCreateBuffer() works when passed CL_MEM_ALLOC_HOST_PTR. Does it create a buffer on the device AND allocate memory for the host, or does it only allocate memory on the host and cache it on the device when it's being used?
My problem is this: If I have quite a few objects that have float* fields that total more space than is available on my device, is there a better way then telling the runtime to copy the host pointer (or use it) to the OpenCL device? Is it possible to have the runtime create the host pointer and use that for all the float* even if they total more memory than the device has? I wouldn't mind telling it to use the host pointer, but if I wanted to avoid memory copies when the runtime is on the CPU I would have to align all the memory.
Also, any tips on good ways to deal with using more memory on the host than is available on the device to make memory transfers the most efficient and do the least copying.
Thanks.
Upvotes: 2
Views: 2205
Reputation: 2565
The standard only states that:
This flag specifies that the application wants the OpenCL implementation to allocate memory from host accessible memory.
So, how it works under the hood is implementation dependent. NVIDIA states in section 3.3.1 of its OpenCL Programming Guide (V4.2) that:
objects using the CL_MEM_ALLOC_HOST_PTR flag (...) are likely to be allocated in page-locked memory by the driver for best performance.
In their own guide (here) AMD, gives in section 4.5.2 a table displaying the location of the memory objects for each flag values. The entire section 4.5 is dedicated to OCL memory objects. You might find it interesting.
Regarding your problem, if you don't have enough memory space there is no other solution (at least that I can think of) than to split your data and process it in several passes as suggested here.
Upvotes: 3