sp497
sp497

Reputation: 2491

Is it possible to launch a cuda kernel with gridsize/block size defined at runtime?

I would like to know whether its possible to launch a cuda kernel so that the grid/block size can be mentioned at run time instead of compile time as usual.

Any help regarding this would be highly invaluable.

Upvotes: 2

Views: 2765

Answers (2)

sgarizvi
sgarizvi

Reputation: 16816

In CUDA applications, it is never very useful to specify fixed sizes for grid. Most of the time block size is fixed and grid size is kept dynamic and changed according to input data size. Consider the following example of vector addition.

__global__ void kernel(float* a, float* b, float* c, int length)
{
    int tid = blockIdx.x * blockDim.x + threadIdx.x;

    //Bound checks inside the kernel
    if(tid<length)
       c[tid] = a[tid] + b[tid];
}

int addVectors(float* a, float* b, float* c, int length)
{
   //a, b, c are allocated on the device

   //Fix the block size to an appropriate value
   dim3 block(128);

   dim3 grid;
   grid.x = (length + block.x - 1)/block.x;

   //Grid size is dependent on the length of the vector. 
   //Total number of threads are rounded up to the nearest multiple of block size.
   //It means total number of threads are at least equal to the length of the vector.

   kernel<<<grid,block>>>(a,b,c,length);

   return 0;
}

Upvotes: 4

perreal
perreal

Reputation: 98118

Cuda kernels and device functions can use blockDim.{x,y,z} to access the block configuration as well as gridDim.{x,y,z} to access the grid configuration. If you have a kernel/device function that can cope with various configurations than all you need to do is launch a kernel (myKernel<<<dimGrid,dimBlock>>>) with whatever dimGrid, or dimBlock you choose at run-time. I don't think this is unusual at all.

Upvotes: 3

Related Questions