CUDA: Method of partitioning *huge* problems?

Question

All this CUDA lark is head-melting in its power, but something I've been wondering about is the hard-limits on 1d block/grid dimensions (usually 512/65535 respectively).

When dealing with problems that are much larger in their scope (in the order of billions), is there an automated programmatic way of effectively setting a 'queue' through a kernel, or is it a case of manual slicing and dicing?

How does everyone deal with problem-partitioning?

Ashwin Nanjappa · Accepted Answer

There are 2 basic ways of partitioning your data, so that you can work on it using CUDA:

Break data down into contiguous chunks, such that each thread works on one chunk.
Each thread nibbles at one element of data. When all threads are done, they shift themselves by numberOfThreads and repeat again.

I have explained these techniques with simple examples here. Method 2 is typically easier to code and work with for most tasks.

CUDA: Method of partitioning huge problems?

Answers (2)

Related Questions