Reputation: 13
Assume that a CPU pointer(cpu_ptr_) already exists, then I create a buffer for gpu(cl_gpu_mem_). The problem is that when I map the gpu buffer to a cpu pointer(mapped_ptr), the mapped_ptr is not equal to the original pointer (cpu_ptr_), which causes that CHECK_EQ(mapped_ptr, cpu_ptr_) raises an error.
cl_gpu_mem_ = clCreateBuffer(ctx.handle().get(),
CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
size_, cpu_ptr_, &err);
void *mapped_ptr = clEnqueueMapBuffer(
ctx.get_queue().handle().get(),
cl_gpu_mem_,
true,
CL_MAP_READ | CL_MAP_WRITE,
0, size_, 0, NULL, NULL, NULL);
CHECK_EQ(mapped_ptr, cpu_ptr_)
<< "Device claims it supports zero copy"
<< " but failed to create correct user ptr buffer";
I don't know why this error occurs at all. Would you please give me any advice for this problem, or any solution to it. Thank you very much.
Upvotes: 1
Views: 72
Reputation: 2796
OpenCL implementations are free to mirror the host pointer (making it non-zero copy). On devices that support true zero copy (e.g. Intel GPU), there are still typically constraints that impose whether we can really use that host allocation directory or must mirror it. On Intel the host address must be page aligned and a the length must be a multiple of 128 bytes (an even cacheline). (I typically just page align both.) I am not sure what AMD and other's requirements are.
Look into aligned_alloc
or overallocate a couple extra pages via and use a page boundary for the base.
Upvotes: 1