KLee1
KLee1

Reputation: 6178

Computing Maximum Concurrent Workgroups

I was wondering if there was a standard way to programatically determine the number of maximum concurrent workgroups that can run on a GPU.

For example, on a NVIDIA card with 5 compute units (or SMs), there can be a maximum of 8 workgroups (or blocks) per compute unit, so the maximum number of workgroups that can be run concurrently is 40.

Since I can find the number of compute units with clGetDeviceInfo, all I need is the maximum number of workgroups that can be run on a compute unit.

Thanks!

Upvotes: 3

Views: 1892

Answers (2)

Manish Kumar
Manish Kumar

Reputation: 1479

Max number of groups per execution unit/ SM are limited by the hardware resources. Let me take example of Intel Gen8 GPU. It contains 16 barrier registers per sub slice. So no more than 16 work groups can run simultaneously.

Also, The amount of shared local memory available per sub-slice (64KB). If for example a work-group requires 32KB of shared local memory, only 2 of those work-groups can run concurrently, regardless of work-group size.

Upvotes: 3

mfa
mfa

Reputation: 5087

I typically use the number of compute units as the number of work groups. I like to scale up the size of the groups to saturate the hardware, rather than force the gpu to schedule many work groups 'simultaneously'.

I don't know of a way to determine the max number of groups without looking it up on the vendor specs.

Upvotes: -1

Related Questions