LyingOnTheSky
LyingOnTheSky

Reputation: 2854

Why use blocks\grid instead of for-loop?

Why use:

kernel<<<512, 512>>>( ); //somewhere
__device__ void kernel( ) {
    Code( );
}

Rather than:

kernel<<<1, 512>>>( 512 ); //somewhere
__device__ void kernel( int n ) {
    for ( int i = 0 ; i < n ; ++i ) {
        Code( );
    }
}

NOTE: I don't have CUDA GPU yet to check it.

Is the first somehow faster? GPU Cores can't handle long-running threads or loses it's speed while running longer?

I guess the second (for-loop) is better when the number of the desired iteration are not aligned to the number of thread. (We can change the n variable in the last thread\core)

Upvotes: 0

Views: 122

Answers (2)

Zzzoom
Zzzoom

Reputation: 1

It's because of how threads get assigned to GPU execution resources. Whole blocks get distributed among the streaming multiprocessors on the GPU. If you launched a grid with a single block, your kernel would run on one SM. This would be fine on a very small GPU with a single SM like the Tegra K1, but on most GPUs, which have multiple SMs (like the 24 on a GTX Titan X), you'd be wasting a considerable amount of resources.

Upvotes: 0

Kerrek SB
Kerrek SB

Reputation: 477140

The very idea of CUDA is that you should do parallel work in parallel. The entire execution architecture is designed to make that fast. Anything which is truly parallel, i.e. where all parallel pieces of the logic execute the exact same logic in lockstep, are better done by executing the same instructions on many, many cores at once, rather than executing many instructions with complicated branching and looping on one core.

I recommend you read the extensive documentation published about CUDA by Nvidia, with special attention to warps, bank conflicts, local memory, branching, etc. Programming for GPU is not at all trivial, and like with any kind of concurrent programming, you should expect the process to be painful and expensive unless you have both an extremely good reason to be concurrent and also understand lots of the low-level details very well.

Upvotes: 2

Related Questions