lhahne
lhahne

Reputation: 6029

Is possible to span an OpenCL kernel to run concurrently on CPU and GPU

Lets assume that I have a computer which has a multicore processor and a GPU. I would like to write an OpenCL program which runs on all cores of the platform. Is this possible or do I need to choose a single device on which to run the kernel?

Upvotes: 4

Views: 1879

Answers (4)

doug65536
doug65536

Reputation: 6771

One context can only be for one platform. If your multi-device code needs to work across platforms (for example, Intel platform CPU OpenCL, and NVidia GPU) then you need separate contexts.

However, if the GPU and CPU happened to be in the same platform, then yes you could use one context.

If you are using multiple devices on the same platform (two identical GPUs, or two GPUs from the same manufacturer) then you can share the context - as long as they both come from a single clGetDeviceIDs call.

EDIT: I should add that a GPU+CPU context doesn't mean any automatically managed CPU+GPU execution. Typically, it is a best-practice to let the driver allocate a memory buffer that can be DMA'd by the GPU for maximum performance. In the case where you have the CPU and GPU in the same context, you'd be able to share those buffers across the two devices.

You still have to split the workload up yourself. My favorite load balancing technique is using events. Every n work items, attach an event object to a command (or enqueue a marker), and wait for the event that you set n workitems ago (the prior one). If you didn't have to wait, then you need to increase n on that device, if you did have to wait, then you should decrease n. This will limit the queue depth, n will hover around the perfect depth to keep the device busy. You need to do it anyway to avoid causing GUI render starvation. Just keep n commands in each command queue (where the CPU and GPU have separate n) and it will divide perfectly.

Upvotes: 2

DarkZeros
DarkZeros

Reputation: 8410

You cannot span a kernel to multiple devices. But if the code you a re running is not dependant on other results (ie: Procesing blocks of 16kB of data, that needs huge processing), you can launch the same kernel on GPU and CPU. And put some blocks on the GPU and some on the CPU.

That way it should boost up the performance.

You can do that, creating a clContext shared for CPU and GPU, and 2 command queues.

This is not aplicable to all the kernels. Some times the kernel code applies to all the input data, and is not able to be separated in parts or chunks.

Upvotes: 1

elmattic
elmattic

Reputation: 12174

No you can't span automagically a kernel on both CPU and GPU, it's either one or the other.

You could do it but this will involve creating and managing manually two command queues (one for each device).

See this thread: http://devforums.amd.com/devforum/messageview.cfm?catid=390&threadid=124591&messid=1072238&parentid=0&FTVAR_FORUMVIEWTMP=Single

Upvotes: 2

Dr. Snoopy
Dr. Snoopy

Reputation: 56347

In theory yes, you can, the CL API allows it. But the platform/implementation must support it, and i don't think most CL implementatations do.

To do it, get the cl_device_id of the CPU device and the GPU device, and create a context with those two devices, using clCreateContext.

Upvotes: 2

Related Questions