Hailiang Zhang
Hailiang Zhang

Reputation: 18960

Why only one of the warps is executed by a SM in cuda?

I frequently found the following words in some CUDA materials:

"At any time, only one of the warps is executed by a SM".

Here I don't quite understand since each SM can run hundreds to thousands of threads simultaneously, why only a single warp, which is 32 threads, can be executed at a time point?

Thanks!

Upvotes: 5

Views: 2459

Answers (2)

Paul R
Paul R

Reputation: 213200

Details vary for different generations of CUDA hardware, but for example in earlier generations each SM has 8 execution units, each of which executes 4 threads (one instruction from each thread every 4 cycles). Hence you get 4 way SMT which gives 32 concurrent threads per SM.

Of course there are multiple SMs per GPU, e.g. 30, which would mean 30 x 32 thread warps = 960 threads executing at any given instant. On top of this warps can be switched in and out so you can have much more than, e.g. 960 "live" threads, even though only 960 of them are actually executing at any given time.

Upvotes: 5

Greg Smith
Greg Smith

Reputation: 11549

The statement is true of the Tesla architecture but it is incorrect for Fermi and Kepler. It is easier to look at the SM in terms of warp schedulers. On each cycle the warp scheduler selects an eligible warp (a warp that is not stalled) and dispatches one or two instructions from the warp to execution units. The number of execution units per SM is documented in the Fermi and Kepler whitepapers. CUDA cores roughly equate to the number of execution units that can perform integer and single precision floating point operations. There are additional execution units for load/store operations, branching, etc.

Compute Capability 1.x (Tesla)

  • 1 warp scheduler per SM
  • Dispatch 1 instruction per warp scheduler

Compute Capability 2.0 (Fermi 1st Generation)

  • 2 warp schedulers per SM
  • Dispatch 1 instruction per warp scheduler

Compute Capability 2.1 (Fermi 2nd Generation)

  • 2 warp schedulers per SM
  • Dispatch 1 or 2 instructions per warp scheduler

Compute Capability 3.x (Kepler)

  • 4 warp schedulers per SM
  • Dispatch 1 or 2 instructions per warp scheduler

Upvotes: 4

Related Questions