user1280671
user1280671

Reputation: 69

Block processing patterns of gpu cards using their SM cores

I have a question about the scheduling processes of compute capability 1.3 and 2.0 gpu cards. The maximum blocks scheduled each time at a Streaming Multiprocessor are 8 in both cases, at least that's what I have noticed from the Occupancy Calculator.

At a 1.3 card each SM has 8 cores and at a 2.0 card there are 32 cores per SM. How are the cores distributed for a block process?

For the 1.3 does each core process 1 block? And if so, if there are less than 8 blocks per SM then more cores than one are assigned to process a block?

For the 2.0 if 8 blocks are scheduled in a SM then are 4 cores assigned to process a block? If there are less blocks in a SM then more cores are scheduled for a block calculation?

Thank you.

Upvotes: 0

Views: 81

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

All cores in a single SM work in lockstep (at least up to cc 2.0). When the threads associated with a single lockstep warp hit a stall for some reason, the scheduler will bring another warp in, if it is ready to run. The new warp may be from the same or a different threadblock, ie. from amongst the up to 8 threadblocks that may be currently resident on that SM.

You may be interested in reading this section of the programming guide.

Upvotes: 3

Related Questions