SHIM
SHIM

Reputation: 101

How CUDA kernel work on multiple blocks each of which have different time consumption?

Assume that we run a kernel function with 4 blocks {b1, b2, b3, b3}. Each the blocks requires {10, 2, 3, 4} amount of time to complete job. And our GPU could process only 2 blocks in parallel.

If then, which one is correct way how our GPU work?

enter image description here

Upvotes: 0

Views: 269

Answers (1)

Jérôme Richard
Jérôme Richard

Reputation: 50836

To quote this document from Nvidia:

Threadblocks are assigned to SMs

  • Assignment happens only if an SM has sufficient resources for the entire threadblock
    • Resources: registers, SMEM, warp slots
    • Threadblocks that haven’t been assigned wait for resources to free up
  • The order in which threadblocks are assigned is not defined
    • Can and does vary between architectures

Thus, without more information, the two scheduling are theoretically possible. In practice, this is even more complex since there are many SMs on a GPU and AFAIK each SM can now execute multiple blocks concurrently.

Upvotes: 1

Related Questions