Reputation: 117
The CUDA programing guide states:
The CUDA architecture is built around a scalable array of multithreaded Streaming Multiprocessors (SMs). When a CUDA program on the host CPU invokes a kernel grid, the blocks of the grid are enumerated and distributed to multiprocessors with available execution capacity. The threads of a thread block execute concurrently on one multiprocessor, and multiple thread blocks can execute concurrently on one multiprocessor. As thread blocks terminate, new blocks are launched on the vacated multiprocessors.
Does it mean that if I have a video card of 2 multiprocessor x n-cuda cores and if a launch a kernel like
MyKernel<<<1,N>>>(sth);
One of the multiprocessors will be idle, since I'm launching a single block of N threads?
Upvotes: 2
Views: 276
Reputation: 72349
You are correct.
In all currect CUDA architectures, a block is only ever scheduled and run on a single multiprocessor. If you run one block on a device with more than one multiprocessor, all but one of those multiprocessors will be idle.
Upvotes: 3