Reputation: 2910
It's the first parallel code of cuda by example .
Can any one describe me about the kernel call : <<< N , 1 >>>
This is the code with important points :
#define N 10
__global__ void add( int *a, int *b, int *c ) {
int tid = blockIdx.x; // this thread handles the data at its thread id
if (tid < N)
c[tid] = a[tid] + b[tid];
}
int main( void ) {
int a[N], b[N], c[N];
int *dev_a, *dev_b, *dev_c;
// allocate the memory on the GPU
// fill the arrays 'a' and 'b' on the CPU
// copy the arrays 'a' and 'b' to the GPU
add<<<N,1>>>( dev_a, dev_b, dev_c );
// copy the array 'c' back from the GPU to the CPU
// display the results
// free the memory allocated on the GPU
return 0;
}
Why it used of <<< N , 1 >>>
that it means we used of N blocks and 1 thread in each block ?? since we can write this <<< 1 , N >>>
and used 1 block and N thread in this block for more optimization.
Upvotes: 3
Views: 18792
Reputation: 687
For this little example, there is no particular reason (as Bart already told you in the comments). But for a larger, more realistic example you should always keep in mind that the number of threads per block is limited. That is, if you use N = 10000, you could not use <<<1,N>>>
anymore, but <<<N,1>>>
would still work.
Upvotes: 5