Poul K. Sørensen
Poul K. Sørensen

Reputation: 17540

OpenCL - Main Kernel filling buffers and run on sub kernels

Being new to OpenCL i would like to know if the following scenario is possible.

In memory there are created 10 buffers of length 10000, or a 10xN image buffer working as cache.

The first kernel are to fill in a row in the cache and query another kernel to do some work on that row. When the second kernel is done, the first kernel can compute a new row and replace the old one and the same procedure continues until the first kernel have no more tasks.

Does the scenarie makes sense and is it posible within GPU programming?

Upvotes: 0

Views: 1081

Answers (1)

prunge
prunge

Reputation: 23248

OpenCL does not allow kernels to call other kernels. But you have some options.

  1. Have the first kernel call another non-kernel function. Work distribution between work items will not change here - so if you have 10 parallel work items (threads) being executed, one on each row, then each thread will operate on the same data in the non-kernel function.

  2. Multiple kernels can be enequeued one after another, but this is coordinated by the host. This does allow redistribution of work among threads, but can be more complicated to do than option 1.

One of the keys to creating fast OpenCL code is splitting your work into work items, usually the more the better. If your first kernel that fills the row can only be split into 10 work items, but your second kernel that does processing on this row can be split up into 1000s of work items, then you definitely want to use option 2 as the second part can be more efficiently split up on devices with large number of cores such as modern GPUs. A small number of work items, such as 10, will only be able to use a fraction of this available processing power.

(Additions)

OpenCL kernels executing on GPUs are data parallel, which means only one kernel can execute at a time but each thread works with a different piece of data. It might be worth rethinking your algorithm to fit in this model.

From what you have written in your comments it sounds like that you want to run 10 items at a time because of memory constraints. But be aware that there is no dynamic memory allocation in OpenCL. All buffers are declared up front. So the host should determine how many tasks could fit into available memory and run batches of work items (with appropriate transfer of buffers).

Also how are the buffers being filled in? From files? OpenCL kernels cannot read files, network, etc. so if this is how the original data is loaded this will have to be done on the host. If, however, these image buffers are created from other sources (e.g. by an algorithm or from another in-memory source) then that should work OK (though you'd need to copy any other in-memory sources to the GPU as well).

Upvotes: 1

Related Questions