Gabor Szita
Gabor Szita

Reputation: 329

What causes and how can I check the number of work-groups limit in OpenCL?

I've shortly started using OpenCL to write programs for GPUs. I'm familiar with basic concepts that are required to write efficient programs in OpenCL, like work-items, work-groups, global-item-size, barriers, etc.

One of my programs involved making about 20 million work-groups with 360 work-items in each work-group. However, for some reason OpenCL couldn't handle that many number of work-groups. All elements of my output array simply remained 0. In addition, OpenCL didn't even start the calculations when I called clEnqueueNDRangeKernel(), since when I viewed the GPU usage stats I didn't see a "spike" that usually happens when I run an OpenCL kernel. I attempted to reduce the number work-groups, to see what is the maximum number of work-groups. It was 5965232 and it is always 5965232. Not more, not less.

I know that the problem is NOT with the number of work-items. It is with the number of work-groups. To prove this, here is my original code, where LIST_SIZE is 360.

global_item_size = 5965232*LIST_SIZE;
local_size = LIST_SIZE;

and a modified version of my code:

global_item_size = 5965232*LIST_SIZE*1.3;
local_size = LIST_SIZE*1.3;

In all the scenarios, the number of work-groups limit was 5965232.

I'm trying to find out what causes this limit and how to check this limit. I understand that there may be a limitation, but what causes this limitation and how can I check check this limit number in OpenCL? I've did a lot of research, but all sites are talking about work-group size limits and not about number of work-group limits.

I'm using the Intel Graphics HD 4000 GPU with an i5-3320M. It has 32 MB of integrated RAM.

Upvotes: 1

Views: 153

Answers (1)

ProjectPhysX
ProjectPhysX

Reputation: 5746

5965232*320 = 2147483520 < 2147483647 = 2^31-1 = maximum 32-bit signed integer value

You are dealing with a classical 32-bit integer overflow in the multiplication in line

global_item_size = 5965232*LIST_SIZE;

Try global_item_size = 5965232ull*(uint64_t)LIST_SIZE; instead. Make sure global_item_size is data type uint64_t.

Upvotes: 2

Related Questions