Michael Clerx
Michael Clerx

Reputation: 3056

OpenCL Solving two different-sized problems simultaneously on a GPU

For a problem I'm working on I need to solve two sub-problems: Sub1 on an NxM grid, and Sub2 on a Kx1 grid. The problem is, these sub-problems should communicate after every step in the solution process so I need to run them simultaneously.

The end result should look like this:

  1. Sub1 is solved for time t
  2. Sub2 is solved for time t
  3. An interaction term between sub1 and sub2 for time t+1 is calculated

This is then repeated for t+1, using the newly calculated interaction term, and then for t+2, t+3, etc. All the data used is stored in global device memory so there doesn't need to be any copying to and from the device in between the steps.

My problem is, how do I tell OpenCL I want to work on two different sized problems at the same time?

Upvotes: 0

Views: 67

Answers (1)

DarkZeros
DarkZeros

Reputation: 8410

Is it really needed to be "at the same time"?

This is a common missunderstanding of OpenCL and parallel systems. Being more and more parallel and having all running in parallel is not always a good choice. In fact, 99% of the cases do not need to be parallel (unless some time constrain exist), and forcing to be so, slows down the speed.

Depending on the sizes and amount of work of Sub1 and Sub2:

  • If it takes very little time or applies to very few amount of data:
    1. Merge both in one process, and scale the work items as needed. Some of them will be idle, but the loss is small and it will be compensated with the local/private memory sharing across Sub1 and Sub2.
  • If they are BIG chucks of processing.
    1. Split both process to 2-3 different kernels, different arguments, etc.
    2. Communicate these two process using global variables
    3. Launch the 2 kernels with different sizes (in order to fit exactly the amount of work)
    4. When both have finished, launch another kernel on the result, to generate the new iteration data.
    5. You can even launch everything at once in a queue and they will launch in order without CPU intervention. That is the easyer aproach.

I would say in your case, you should go for many kernels.

Upvotes: 1

Related Questions