Zhibo Shen
Zhibo Shen

Reputation: 145

Inconsistency between OpenGL and CUDA maximum number of threads

My GPU is NVIDIA GeForce GT440, whose compute capability version is 2.x. NVIDIA's official CUDA_C_Programming_Guide points out

Limit 1. Maximum number of threads per block = 1024
Limit 2. Maximum number of resident threads per multiprocessor = 1536

However, two of the OpenGL computer shader implementation limits are

Limit 3. GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS = 1536

My questions are
1. Why Limit 1 is not equal to Limit 2 and Limit 3?
2. Should the real threads/block (invocations/workgroup) be 1024 or 1536?

Upvotes: 0

Views: 324

Answers (1)

talonmies
talonmies

Reputation: 72352

Why Limit 1 is not equal to Limit 2 and Limit 3?

Because it isn't the same thing. Blocks are a logical construct in CUDA and are limited to a maximum of 1024 threads. But a multiprocessor can run multiple blocks concurrently (up to 8 in the case of your hardware). So a SM can have up to 1536 concurrent threads in your hardware, but not all of those threads can come from a single block.

Should the real threads/block be 1024 or 1536?

1024 for all the reasons outlined above. You can see a complete summary of the capabilities of all supported hardware here.

Upvotes: 2

Related Questions