Reputation: 101

How CUDA kernel work on multiple blocks each of which have different time consumption?

Assume that we run a kernel function with 4 blocks {b1, b2, b3, b3}. Each the blocks requires {10, 2, 3, 4} amount of time to complete job. And our GPU could process only 2 blocks in parallel.

If then, which one is correct way how our GPU work?

Upvotes: 0

Answers (1)

Jérôme Richard

Reputation: 50836

To quote this document from Nvidia:

Threadblocks are assigned to SMs

Assignment happens only if an SM has sufficient resources for the entire threadblock

Resources: registers, SMEM, warp slots

Threadblocks that haven’t been assigned wait for resources to free up

The order in which threadblocks are assigned is not defined

Can and does vary between architectures

Thus, without more information, the two scheduling are theoretically possible. In practice, this is even more complex since there are many SMs on a GPU and AFAIK each SM can now execute multiple blocks concurrently.

Upvotes: 1

How CUDA kernel work on multiple blocks each of which have different time consumption?

Answers (1)

Related Questions