Reputation: 2491
I would like to know whether its possible to launch a cuda kernel so that the grid/block size can be mentioned at run time instead of compile time as usual.
Any help regarding this would be highly invaluable.
Upvotes: 2
Views: 2765
Reputation: 16816
In CUDA applications, it is never very useful to specify fixed sizes for grid. Most of the time block size is fixed and grid size is kept dynamic and changed according to input data size. Consider the following example of vector addition.
__global__ void kernel(float* a, float* b, float* c, int length)
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
//Bound checks inside the kernel
if(tid<length)
c[tid] = a[tid] + b[tid];
}
int addVectors(float* a, float* b, float* c, int length)
{
//a, b, c are allocated on the device
//Fix the block size to an appropriate value
dim3 block(128);
dim3 grid;
grid.x = (length + block.x - 1)/block.x;
//Grid size is dependent on the length of the vector.
//Total number of threads are rounded up to the nearest multiple of block size.
//It means total number of threads are at least equal to the length of the vector.
kernel<<<grid,block>>>(a,b,c,length);
return 0;
}
Upvotes: 4
Reputation: 98118
Cuda kernels and device functions can use blockDim.{x,y,z
} to access the block configuration as well as gridDim.{x,y,z
} to access the grid configuration. If you have a kernel/device function that can cope with various configurations than all you need to do is launch a kernel (myKernel<<<dimGrid,dimBlock>>>
) with whatever dimGrid
, or dimBlock
you choose at run-time. I don't think this is unusual at all.
Upvotes: 3