Reputation: 857
Suppose a CUDA GPU can have 48 simultaneously active warps on one multiprocessor, that is 48 blocks of one warp, or 24 blocks of 2 warp, ..., since all the active warps from multiple blocks are scheduled for execution, it seems the size of the block is not important for the occupancy of the GPU (of course it should be multiple of 32), whether 32, 64, or 128 make no difference, right? So the size of the block is just determined by the computation task and the resource limit (shared memory or registers)?
Upvotes: 2
Views: 2487
Reputation: 21818
There are multiple factors worth considering, that you ommit.
Upvotes: 3
Reputation: 48330
No. The blocksize does matter.
If you have a blocksize of 32 threads you have a very low occupancy. If you have a blocksize of 256 you have a high occupancy. That means that all the 256 are concurrently active. More than 256 threads / block would rarely make some difference.
As the architecture involved is complex, testing it with your software is always the best approach.
Upvotes: -1