Reputation: 43427
If we look at Pascal's SM architecture, it is made up of (for GP100) two "processing blocks" which have a warp scheduler and register file each.
Looking at GP102 this increases to 4 processing blocks.
My question is when a warp becomes scheduled onto one of the processing blocks, its registers will be allocated into that processing block's register file, so it seems to me that it will need to stay resident in that specific part of the SM, until the warp completes executing. Is this the case, or could it ever be evicted?
Upvotes: 0
Views: 127
Reputation: 294
Each of the processing blocks(or SM sub-partition SMSP) can contain a maximum number of active warps that is limited by the architecture. In CUDA, there is no context switch at block level, so that once a block is assigned to a SM, it will reside there until it completes its execution. This means that the warp that is assigned to a sub-partition will reside there until all warps of its block are finished executing. If a warp finishes earlier than other warps in the block, it will become passive. This is caused by load imbalance in the block. When all warps of an active block complete their executions, the block will be evicted and resources(registers, shared memory etc.) will become available for new blocks to be assigned.
Upvotes: 1