cudaOccupancyMaxPotentialBlockSize API template usage

Question

I m trying to understand the CUDA occupancy API cudaOccupancyMaxPotentialBlockSize.

The templated version is defined as below.

template
__inline__ __host__ CUDART_DEVICE cudaError_t 
cudaOccupancyMaxPotentialBlockSize(
int    *minGridSize,
int    *blockSize,
T       func,
size_t  dynamicSMemSize = 0,
int     blockSizeLimit = 0)
{
return cudaOccupancyMaxPotentialBlockSizeVariableSMem(minGridSize, blockSize, func, __cudaOccupancyB2DHelper(dynamicSMemSize), blockSizeLimit);
}

I haven't found much examples on how to use the API if the kernel is a template.

I have found one in our code as seen in below example.

template  __global__ void fn(T *a) { *a = 10;}

cudaOccupancyMaxPotentialBlockSize(&gridSize, &blockSize, fn, 0, 0);

In this case, this void* (int*) represents the function pointer (of the kernel) returning void and int as an argument. Is my understanding correct.

If so, since the return type of the kernel is always void, Is the first parameter in the template list <(void*)(int*)> always void and then the others like int* follows according to the arguments list?

Can anyone explain better(with examples) on how to use this API with kernel being template.

IGarFieldI · Accepted Answer

void(*)(int) is regular C(++) function-pointer syntax, so yes, it will always be void(*)(Args...). Alternatively, you can put the template arguments at the function pointer itself at let the argument deduction take over: cudaOccupancyMaxPotentialBlockSize(&gridSize, &blockSize, &fn, 0, 0)

cudaOccupancyMaxPotentialBlockSize API template usage

Answers (1)

Related Questions