Reputation: 1344
I'm trying to copy an image using OpenCL:
std::string kernelCode =
"void kernel copy(global const int* image, global int* result)"
"{"
"result[get_global_id(0)] = image[get_global_id(0)];"
"}";
The image contains 200 * 300 pixels.
The maximum number of work-items is 4100 according to CL_DEVICE_MAX_WORK_GROUP_SIZE
In the queue:
int size = _originalImage.width() * _originalImage.height();
//...
queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(size), cl::NullRange);
Gives segfault.
queue.enqueueNDRangeKernel(imgProcess, cl::NullRange, cl::NDRange(10000), cl::NullRange);
Runs fine, but it gives back only part of the image.
What am I missing here?
Upvotes: 0
Views: 762
Reputation: 8796
As you have already stated correctly, your CL_DEVICE_MAX_WORK_GROUP_SIZE is less than the number of threads you want to start. The segfault indicates an error in the runtime. You can get C++ errors from OpenCL if you add the following define at the beginning of your codefile (before you include any OpenCL headers)
#define __CL_ENABLE_EXCEPTIONS
The second line of code clearly only copies the first 10000 pixels of your image instead of all 60000. If you want to use only 10000 threads, you need to do this call six times with an adjusted NDRange offset each time.
Generally I would advise to either use cl::copy to copy an image or modify your kernel to copy multiple pixels per thread.
Furthermore I'm quite unsure about the effect of setting the local workgroup size to NullRange. As the local workgroup size does not matter in your case, I think it is the best to just leave out this parameter and use the version of enqueueNDRangeKernel
with only 3 arguments (omitting the last one).
Upvotes: 3