overload cuda kernel function

Question

I encountered a problem when using the overloaded kernel functions in CUDA.

I can understand CUDA can launch an overloaded function by its arguments.

However, if I would like to use cudaOccupancyMaxPotentialBlockSize() to calculate the block size for maximum occupancy, see doc.

__global__ void foo_cuda_kernel(int a)
{
  /*implementation 1*/
}

//overloaded kernel function
__global__ void foo_cuda_kernel(int a, int b)
{
  /*implementation 2*/
}

void foo_cuda()
{
  int min_grid_size, grid_size, block_size;
  cudaOccupancyMaxPotentialBlockSize
  (
    &min_grid_size, &block_size, 
    foo_cuda_kernel, //how does it distinguish overloaded functions?
    0, thread_num
  );
  grid_size = (thread_num + block_size - 1) / block_size;
  
  //I can understand compiler can distinguish the launched function by its arguments
  foo_cuda_kernel<<>>((int)1);
  cudaDeviceSynchronize();
}

How to make it works? How cudaOccupancyMaxPotentialBlockSize() distinguishes overloaded functions?

overload cuda kernel function

Answers (1)

Related Questions