Kernel calls in CUDA

I am new with CUDA, and I am confuse with the kernel calls.

When you call a Kernel method you specify the number of blocks and the thread per block, like this kernelMethod<<< block, Threads >>>(parameters);"

So why it is possible to use a 3rd parameter? kernelMethod<<< block, Threads, ???>>>(parameters);

Using cudaDeviceProp you can read the number of thread per block in the variable maxThreadsPerBlock. But how can I know the maximum number of blocks? Thanks!!

Upvotes: 0

Views: 885

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151809

The third parameter specifies the amount of shared memory per block to be dynamically allocated. The programming guide provides additional detail about shared memory, as well as a description and example.

Shared memory can be allocated statically in a kernel:

__shared__ int myints[256];

or dynamically:

extern __shared__ int myints[];

In the latter case, it's necessary to pass as an additional kernel config parameter (the 3rd parameter you mention) the size of the shared memory to be allocated in bytes.

In that event, the pointer myints then points to the beginning of that dynamically allocated region.

The maximum number of blocks is specified per grid dimension (x, y, z) and can also be obtained through the device properties query. It is specified in the maxGridSize parameter. You may want to refer to the deviceQuery sample for a worked example.

Upvotes: 5

Related Questions