clEnqueueNDRangeKernel with work dimension=2

Question

I am writing a code to add two matrices of dimension 1024*1024 each. So my work dimension has to be 2 and the global work size shall be 1024*1024. I want to set the size of each work group to 64*64. How do I achieve that?

So my code should be something like:-

clEnqueueNDRangeKernel(cl_command_queue command_queue,cl_kernel kernel,cl_uint work_dim,const size_t *global_work_offset,
                       const size_t *global_work_size,const size_t *local_work_size,
                       cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

where local_work_size=64*64, global_work_size=1024*1024, work_dim=2. How do I obtain individual elements in my kernel code?

This is my kernel code:-

__kernel void hello(__global int ** A,__global int ** B,__global int ** C)
{
      int x = get_global_id(0);
     int y = get_global_id(1);
    C[x][y]=A[x][y]+B[x][y];
}

jprice · Accepted Answer

Your kernel launch would look like this:

size_t global[2] = {1024, 1024};
size_t local[2]  = {64, 64};
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, local, 0, NULL, NULL);

and your kernel would retrieve its indices like this:

kernel void foo(...)
{
  int x = get_global_id(0);
  int y = get_global_id(1);
  ...
}

As an aside, most OpenCL devices that I've come across have a maximum work-group size of 1024, which would mean that they wouldn't support a work-group size of 64x64.

Since you can only use 1D buffers in OpenCL, you need to compute your linear array indices manually. Here's how your simple matrix addition kernel would look:

__kernel void hello(__global int *A,__global int *B,__global int *C, int width)
{
  int x = get_global_id(0);
  int y = get_global_id(1);
  int index = x + y*width;
  C[index] = A[index] + B[index];
}

clEnqueueNDRangeKernel with work dimension=2

Answers (1)

Related Questions