Reputation: 569
My computer has a GeForce 1080Ti. With 11GB of VRAM, I don't expect a memory issue, so I'm at a loss to explain why the following breaks my code.
I execute the kernel on the host with this code.
cl_mem buffer = clCreateBuffer(context.GetContext(), CL_MEM_READ_WRITE, n * n * sizeof(int), NULL, &error);
error = clSetKernelArg(context.GetKernel(myKernel), 1, n * n, m1);
error = clSetKernelArg(context.GetKernel(myKernel), 0, sizeof(cl_mem), &buffer);
error = clEnqueueNDRangeKernel(context.GetCommandQueue(0), context.GetKernel(myKernel), 1, NULL, 10, 10, 0, NULL, NULL);
clFinish(context.GetCommandQueue(0));
error = clEnqueueReadBuffer(context.GetCommandQueue(0), buffer, true, 0, n * n * sizeof(int), results, 0, NULL, NULL);
results
is a pointer to an n-by-n int
array. m1 is a pointer to an n-by-n-bit array. The variable n is divisible by 8, so we can interpret the array as a char
array.
The first ten values of the array are set to 1025 by the kernel (the value isn't important):
__kernel void PopCountProduct (__global int *results)
{
results[get_global_id(0)] = 1025;
}
When I print out the result on the host, the first 10 indices are 1025. All is well and good.
Suddenly it stops working when I introduce an additional argument:
__kernel void PopCountProduct (__global int *results, __global char *m)
{
results[get_global_id(0)] = 1025;
}
Why is this happening? Am I missing something crucial about OpenCL?
Upvotes: 0
Views: 54
Reputation: 74
You can't pas host pointer to clSetKernelArg
in OpenCL 1.2. Similar thing can only be done in OpenCL 2.0+ by clSetKernelArgSVMPointer
with SVM pointer if supported. But most probable making a buffer object on GPU and copying host memory to it is what you need.
Upvotes: 1