Reputation: 145
My GPU is NVIDIA GeForce GT440, whose compute capability version is 2.x. NVIDIA's official CUDA_C_Programming_Guide points out
Limit 1. Maximum number of threads per block = 1024
Limit 2. Maximum number of resident threads per multiprocessor = 1536
However, two of the OpenGL computer shader implementation limits are
Limit 3. GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS = 1536
My questions are
1. Why Limit 1 is not equal to Limit 2 and Limit 3?
2. Should the real threads/block (invocations/workgroup) be 1024 or 1536?
Upvotes: 0
Views: 324
Reputation: 72352
Why Limit 1 is not equal to Limit 2 and Limit 3?
Because it isn't the same thing. Blocks are a logical construct in CUDA and are limited to a maximum of 1024 threads. But a multiprocessor can run multiple blocks concurrently (up to 8 in the case of your hardware). So a SM can have up to 1536 concurrent threads in your hardware, but not all of those threads can come from a single block.
Should the real threads/block be 1024 or 1536?
1024 for all the reasons outlined above. You can see a complete summary of the capabilities of all supported hardware here.
Upvotes: 2