Why only one of the warps is executed by a SM in cuda?

Question

I frequently found the following words in some CUDA materials:

"At any time, only one of the warps is executed by a SM".

Here I don't quite understand since each SM can run hundreds to thousands of threads simultaneously, why only a single warp, which is 32 threads, can be executed at a time point?

Thanks!

Paul R · Accepted Answer

Details vary for different generations of CUDA hardware, but for example in earlier generations each SM has 8 execution units, each of which executes 4 threads (one instruction from each thread every 4 cycles). Hence you get 4 way SMT which gives 32 concurrent threads per SM.

Of course there are multiple SMs per GPU, e.g. 30, which would mean 30 x 32 thread warps = 960 threads executing at any given instant. On top of this warps can be switched in and out so you can have much more than, e.g. 960 "live" threads, even though only 960 of them are actually executing at any given time.

Why only one of the warps is executed by a SM in cuda?

Answers (2)

Related Questions