Reputation: 403
I m trying to understand the CUDA occupancy API cudaOccupancyMaxPotentialBlockSize.
The templated version is defined as below.
template<class T>
__inline__ __host__ CUDART_DEVICE cudaError_t
cudaOccupancyMaxPotentialBlockSize(
int *minGridSize,
int *blockSize,
T func,
size_t dynamicSMemSize = 0,
int blockSizeLimit = 0)
{
return cudaOccupancyMaxPotentialBlockSizeVariableSMem(minGridSize, blockSize, func, __cudaOccupancyB2DHelper(dynamicSMemSize), blockSizeLimit);
}
I haven't found much examples on how to use the API if the kernel is a template.
I have found one in our code as seen in below example.
template <typename T> __global__ void fn(T *a) { *a = 10;}
cudaOccupancyMaxPotentialBlockSize<void(*)(int *)>(&gridSize, &blockSize, fn, 0, 0);
In this case, this void* (int*) represents the function pointer (of the kernel) returning void and int as an argument. Is my understanding correct.
If so, since the return type of the kernel is always void, Is the first parameter in the template list <(void*)(int*)> always void and then the others like int* follows according to the arguments list?
Can anyone explain better(with examples) on how to use this API with kernel being template.
Upvotes: 1
Views: 3582
Reputation: 585
void(*)(int)
is regular C(++) function-pointer syntax, so yes, it will always be void(*)(Args...)
. Alternatively, you can put the template arguments at the function pointer itself at let the argument deduction take over: cudaOccupancyMaxPotentialBlockSize(&gridSize, &blockSize, &fn<float>, 0, 0)
Upvotes: 2