Nike
Nike

Reputation: 475

Executing different kernels on different GPUs simultaneously

Basically I have two GPUs and I want to execute some kernels on each of them. I don't want the GPUs to be working on the same kernel with each doing some part of it(I don know if this is possible), just in case I don even want to see that behavior.

I just want to make sure that both the devices are being exercised. I have created context and the command queues for both of them. But I see only one kernel gets executed which means only one device is being used. This is how I have done it. . .

cl_device_id *device;
cl_kernel *kernels;
...
// creating context.  
context = clCreateContext(0, num_devices, device, NULL, NULL, &error);
...
// creating command queues for all kernels
for(int i = 0; i<num_kenrels; i++)
    cmdQ[i] = clCreateCommandQueue(context, *device, 0, &error);
...
// enqueue kernels 
error = clEnqueueNDRangeKernel(*cmdQ, *kernels, 2, 0, glbsize, 0, 0, NULL, NULL);

Am I going the correct way?

Upvotes: 2

Views: 1614

Answers (1)

matthias
matthias

Reputation: 2181

It depends on how you actually filled your device array. In case you initialized it correctly, creating the context spanning the devices is correct.

Unfortunately, you have a wrong idea about kernels and command queues. A kernel is created from a program for a particular context. A queue on the other hand is used to communicate with a certain device. What you want to do is create one queue per device not kernel:

for (int i = 0; i < num_devices; i++)
    cmdQ[i] = clCreateCommandQueue(context, device[i], 0, &error);

Now you can enqueue the different (or same) kernels on different devices via the corresponding command queues:

clEnqueueNDRangeKernel(cmdQ[0], kernels[0], /* ... */);
clEnqueueNDRangeKernel(cmdQ[1], kernels[1], /* ... */);

To sum up the terms:

  • A cl_context is created for a particular cl_platform_id and is like a container for a subset of devices,
  • a cl_program is created and built for a cl_context and its associated devices
  • a cl_kernel is extracted from a cl_program but can only be used on devices associated with the program's context,
  • a cl_command_queue is created for a specific device belonging to a certain context,
  • memory operations and kernel calls are enqueued in a command queue and executed on the corresponding device.

Upvotes: 7

Related Questions