Reputation: 1109
I am having problems in understanding OpenCL 2D workgroup dimensions.
I want to create a N x N matrix and give each row to a set of workgroups for computation. For example if I have a 1000 x 1000 matrix I would like to have 10 workgroups per row (so each workgroup would compute 100 elements) and 10000 workgroups in total (10 * 1000).
This is a piece of my code:
size_t global_pattern[] = {n,n,0}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n, 0}; //My workgroups pattern
And some lines below:
err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern,
group_pattern, 0, NULL, NULL);
Which fails. The err code is -54 that refers to #define CL_INVALID_WORK_GROUP_SIZE -54.
Is my initialization of the group_pattern array correct? I mean, does the first element of the array refers to the x coordinate and the second one to the y coordinate as I am assuming? What is the math behind?
For debugging purposes I also tried to run it with an empty kernel (just to be sure that it is not my code the problem). I also tried to swap the first and the second element in the group_pattern array.
Upvotes: 2
Views: 2887
Reputation: 8410
When you say:
size_t global_pattern[] = {n,n,0}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n, 0}; //My workgroups pattern
And then use it like:
err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern, group_pattern, 0, NULL, NULL);
It is clearly wrong. What you have to pass to the kernel is the local_size
AKA, the size of one group. Not how do you want to split your space.
So it would be like this for your case:
size_t global_size[] = {n,n}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n}; //My workgroups pattern
size_t local_size[] = {global_size[0]/group_pattern[0], global_size[1]/group_pattern[1]}; //My workgroups pattern
err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern, local_size, 0, NULL, NULL);
EDIT: If you have a CL_INVALID_WORK_GROUP_SIZE, then the causes could be that:
Upvotes: 4