What to do if I have more work-items than SIZE_MAX in OpenCL

Question

My OpenCL program involves having about 7 billion work-items. In my C++ program, I would set this to my global_item_size:

size_t global_item_size = 7200000000;

If my program is compiled to 64-bit systems (x64), this global size is OK, since SIZE_MAX (the maximum value of size_t) is much larger than 7 billion. However, to ensure backwards compatibility I want to make sure that my program is able to compile to 32-bit systems (x86). On 32-bit systems, SIZE_MAX is about 4 billion, less than my global size, 7 billion. If I would try to set the global size to 7 billion, it would result in an overflow. What can I do in this case?

One of the solutions I was thinking about was to make a multi-dimensional global size and local size. However, this solution requires the kernel to calculate the original global size (because my kernel heavily depends on the global and local size), which would result in a performance loss.

The other solution I considered was to launch multiple kernels. I think this solution would be a little "sloppy" and synchronizing kernels also wouldn't be the best solution.

So my question basically is: How can I (if possible) make the global size larger than the maximum size of size_t? If this is not possible, what are some workarounds?

Rob · Accepted Answer

If you want to avoid batches you can give each kernel more work but effectively wrapping the code in a for loop. E.g.

for (int i = 0; i < WORK_ITEMS_PER_THREAD; ++i)
{
    size_t id = WORK_ITEMS_PER_THREAD * get_global_id(0) + i;

    ...
}

What to do if I have more work-items than SIZE_MAX in OpenCL

Answers (2)

Related Questions