Kami
Kami

Reputation: 1109

OpenCL 2D workgroup dimensions

I am having problems in understanding OpenCL 2D workgroup dimensions.

I want to create a N x N matrix and give each row to a set of workgroups for computation. For example if I have a 1000 x 1000 matrix I would like to have 10 workgroups per row (so each workgroup would compute 100 elements) and 10000 workgroups in total (10 * 1000).

This is a piece of my code:

size_t global_pattern[] = {n,n,0}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n, 0}; //My workgroups pattern

And some lines below:

err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern, 
                             group_pattern, 0, NULL, NULL);

Which fails. The err code is -54 that refers to #define CL_INVALID_WORK_GROUP_SIZE -54.

Is my initialization of the group_pattern array correct? I mean, does the first element of the array refers to the x coordinate and the second one to the y coordinate as I am assuming? What is the math behind?

For debugging purposes I also tried to run it with an empty kernel (just to be sure that it is not my code the problem). I also tried to swap the first and the second element in the group_pattern array.

Upvotes: 2

Views: 2887

Answers (1)

DarkZeros
DarkZeros

Reputation: 8410

When you say:

size_t global_pattern[] = {n,n,0}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n, 0}; //My workgroups pattern

And then use it like:

err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern, group_pattern, 0, NULL, NULL);

It is clearly wrong. What you have to pass to the kernel is the local_size AKA, the size of one group. Not how do you want to split your space.

So it would be like this for your case:

size_t global_size[] = {n,n}; //My matrix pattern
size_t group_pattern[] = {workgroups_per_row, n}; //My workgroups pattern
size_t local_size[] = {global_size[0]/group_pattern[0], global_size[1]/group_pattern[1]}; //My workgroups pattern

err = clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_pattern, local_size, 0, NULL, NULL);

EDIT: If you have a CL_INVALID_WORK_GROUP_SIZE, then the causes could be that:

  1. You are specifying a local size that is not an integer divisor of the global size
  2. The device does not support a local size as big as you defined

Upvotes: 4

Related Questions