Mehdi Saman Booy
Mehdi Saman Booy

Reputation: 2910

CUDA kernel call in a simple sample

It's the first parallel code of cuda by example .

Can any one describe me about the kernel call : <<< N , 1 >>>

This is the code with important points :

#define N   10

__global__ void add( int *a, int *b, int *c ) {
    int tid = blockIdx.x;    // this thread handles the data at its thread id
    if (tid < N)
        c[tid] = a[tid] + b[tid];
}

int main( void ) {
    int a[N], b[N], c[N];
    int *dev_a, *dev_b, *dev_c;

    // allocate the memory on the GPU
    // fill the arrays 'a' and 'b' on the CPU
    // copy the arrays 'a' and 'b' to the GPU

    add<<<N,1>>>( dev_a, dev_b, dev_c );

    // copy the array 'c' back from the GPU to the CPU
    // display the results
    // free the memory allocated on the GPU

    return 0;
}

Why it used of <<< N , 1 >>> that it means we used of N blocks and 1 thread in each block ?? since we can write this <<< 1 , N >>> and used 1 block and N thread in this block for more optimization.

Upvotes: 3

Views: 18792

Answers (1)

kroneml
kroneml

Reputation: 687

For this little example, there is no particular reason (as Bart already told you in the comments). But for a larger, more realistic example you should always keep in mind that the number of threads per block is limited. That is, if you use N = 10000, you could not use <<<1,N>>> anymore, but <<<N,1>>> would still work.

Upvotes: 5

Related Questions