Why CUDA Occupancy is defined in terms of number of active warps over max warps supported

Question

The occupancy is defined to be the number of active warps over the number of max warps supported on one Stream Multiprocessor. Let us say I have 4 blocks running on one SM, each block has 320 threads, i.e., 10 warps, so 40 warps on one SM. The Occupancy is 40/48, assuming max warps on one SM is 48 (CC 2.x).

But in total I have 320 * 4 threads running on one SM, and there are only 48 CUDA cores on one SM. Why the occupancy is not 100%? I am using all CUDA cores...

I am pretty sure I am missing something...

talonmies · Accepted Answer

Because occupancy has nothing to do with cores. CUDA is a pipelined SIMD style architecture. Your 48 cores are fed per warp instructions from a pipeline (dual issued, in fact). You need a lot of warps to keep the instruction pipeline full, otherwise all the cores will stall. That is why occupancy is a somewhat useful metric for quantifying the ability of a given kernel to supply enough parallel work to achieve reasonable performance.

Why CUDA Occupancy is defined in terms of number of active warps over max warps supported

Answers (1)

Related Questions