Reputation: 829
I'm new to C, C++ and OpenCL. I have two questions.
(1) If I have a number of host input data variables such as long and double arrays is there any way to avoid copying each one to the device (in the traditional OpenCL way i.e. createBuffer etc) and instead simply map some memory from the device into the host and write host pointers into the device memory to then access within the kernel? I'm told there is but I cannot figure out the code to do this.
Below I have an example input data array. The objective is to somehow relay a pointer to it to the device without copying it in any way as the various input data variables could be very large. I allocate a buffer, enqueue a map buffer, get a device pointer back but then I'm unsure of how to pass the input to that device pointer. I've used a type of cl_long for the device pointer which may be wrong.
cl_long inputData[2] = {1,2};
cl_mem inputBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
sizeof(cl_long) * 2, NULL, NULL);
cl long* inputMap = (cl_long*) clEnqueueMapBuffer(
queue, inputBuffer, TRUE, CL_MAP_WRITE, 0,
sizeof(cl_long) * 2, 0, NULL, NULL, NULL);
// what to do here?
clEnqueueUnmapMemObject(queue, inputBuffer, inputMap, 0, NULL, NULL);
I've used space for two cl_longs above but in reality if I'm passing pointers to host data what would I allocate here?
(2) What about packing pointers to more than one input variable into the same memory space returned by clEnqueueMapBuffer? Let's say I have a long array and a double array can I pass pointers to both of them into the same piece of mapped device memory?
I would really appreciate some example source code with particular elaboration on host and device memory and how they are kept in sync as well as on pointers as I'm a bit new to them.
P.S. I have seen another example on SO of writing host data into device mapped memory (http://stackoverflow.com/questions/5673794/opencl-mapped-memory-doesnt-work) but again it uses a manual write of data into the memory equivalent to copying.
UPDATE: In response to Raj's comment (replying here in case my comment is too long) I've already begun using that flag but probably have a mistake in my pointer code somewhere.
double a[2] = { 3.0, 6.0 } ;
size_t pointerSize = sizeof(double*);
cl_mem bufA = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, pointerSize, NULL, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &bufA);
double* pA = (double*) clEnqueueMapBuffer(queue, bufA, CL_TRUE, CL_MAP_WRITE, 0, pointerSize, 0, NULL, NULL, &err);
*pA = *a;
At this point if I print a[0] and a[1] in the kernel itself I get:
a[0]=3.000000
a[1]=-0.000000
a[1] is obviously wrong. Any ideas what I'm doing wrong?
Upvotes: 0
Views: 1850
Reputation: 3462
So the answer would be to create a buffer using clCreateBuffer
and passing this parameter CL_MEM_ALLOC_HOST_PTR
Check out this description OpenCL create Buffer API
On CUDA Architecture it is similar to cudaHostAlloc
. cudaHostAlloc
will allocate memory on host, which is also accessible to the GPU Device. More information about the same can be found on this Webpage
Upvotes: 2