Will 32 threads from 32 block be scheduled as a warp?

Question

I understand that in CUDA, 32 adjacent threads in the same block will be scheduled as a warp. But I frequently finds some tutorial CUDA codes that has multiple blocks with 1 thread per block. In this model, will 32 threads from 32 block be scheduled as a warp? If not, can I say this model is not as efficient as organizing into 32-threads per block? Thanks!

Robert Crovella · Accepted Answer

No, threads from different blocks cannot be scheduled in the same warp. If you create grids of threadblocks with only a single thread, you're definitely not getting the full performance from the machine. It's less efficient than having 32 (or an integer multiple of 32) threads per block. A Fermi SM, for example has 32 warp lanes that can be in use. If you are scheduling blocks of a single thread, then only 1 of those 32 lanes can be in use at any given time.

Threads have a thread ID (threadIdx built-in variable) which is defined within (and unique only to) a single block.

The Hardware multithreading section of the C programming guide gives a formula which defines the total number of warps in a single block.

Will 32 threads from 32 block be scheduled as a warp?

Answers (2)

Related Questions