cudaOccupancyMaxActiveBlocksPerMultiprocessor is undefined

Question

I am trying to learn cuda and use it in an efficient way. And I have found a code from nvidia's website, which tells that we can learn what should be the block size that we should use for the device's most efficient usage. The code is as follows :

#include 

// Device code
__global__ void MyKernel(int *d, int *a, int *b)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    d[idx] = a[idx] * b[idx];
}

// Host code
int main()
{
    int numBlocks;        // Occupancy in terms of active blocks
    int blockSize = 32;

    // These variables are used to convert occupancy to warps
    int device;
    cudaDeviceProp prop;
    int activeWarps;
    int maxWarps;

    cudaGetDevice(&device);
    cudaGetDeviceProperties(&prop, device);

    cudaOccupancyMaxActiveBlocksPerMultiprocessor(
    &numBlocks,
    MyKernel,
    blockSize,
    0);

    activeWarps = numBlocks * blockSize / prop.warpSize;
    maxWarps = prop.maxThreadsPerMultiProcessor / prop.warpSize;

    std::cout << "Occupancy: " << (double)activeWarps / maxWarps * 100 << "%" << std::endl;

    return 0;
}

However, when I compiled it, there is the following error :

Compile line :

nvcc ben_deneme2.cu -arch=sm_35 -rdc=true -lcublas -lcublas_device -lcudadevrt -o my

Error :

ben_deneme2.cu(25): error: identifier "cudaOccupancyMaxActiveBlocksPerMultiprocessor" is undefined

1 error detected in the compilation of "/tmp/tmpxft_0000623d_00000000-8_ben_deneme2.cpp1.ii".

Should I include a library for this, though I could not find a library name for this on the internet? Or am I doing something else wrong? Thanks in advance

pQB · Accepted Answer

The cudaOccupancyMaxActiveBlocksPerMultiprocessorfunction is included in CUDA 6.5. You have not access to that function if you have a previous version of CUDA installed, for example, it will not work for CUDA 5.5.

If you want to use that function you must update your CUDA version at least to to 6.5.

People using older versions usually use the Cuda Occupancy Calculator.

One common heuristic used to choose a good block size is to aim for high occupancy, which is the ratio of the number of active warps per multiprocessor to the maximum number of warps that can be active on the multiprocessor at once. -- CUDA Pro Tip: Occupancy API Simplifies Launch Configuration

cudaOccupancyMaxActiveBlocksPerMultiprocessor is undefined

Answers (1)

Related Questions