Reputation: 69
I have a question about the scheduling processes of compute capability 1.3 and 2.0 gpu cards. The maximum blocks scheduled each time at a Streaming Multiprocessor are 8 in both cases, at least that's what I have noticed from the Occupancy Calculator.
At a 1.3 card each SM has 8 cores and at a 2.0 card there are 32 cores per SM. How are the cores distributed for a block process?
For the 1.3 does each core process 1 block? And if so, if there are less than 8 blocks per SM then more cores than one are assigned to process a block?
For the 2.0 if 8 blocks are scheduled in a SM then are 4 cores assigned to process a block? If there are less blocks in a SM then more cores are scheduled for a block calculation?
Thank you.
Upvotes: 0
Views: 81
Reputation: 151799
All cores in a single SM work in lockstep (at least up to cc 2.0). When the threads associated with a single lockstep warp hit a stall for some reason, the scheduler will bring another warp in, if it is ready to run. The new warp may be from the same or a different threadblock, ie. from amongst the up to 8 threadblocks that may be currently resident on that SM.
You may be interested in reading this section of the programming guide.
Upvotes: 3