Reputation: 53
is possible to determine the max number of threads per block dynamically? i.e. a function to ask to the GPU the value and store it in a variable. Thanks for your help.
Thanks, I determined the max number of threads with the following code:
int dev = 0;
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
unsigned int maxThreads = deviceProp.maxThreadsPerBlock;
and with this number I calculate blocks and threads for my kernel with this lines:
unsigned int blocksNum = 1+((mSize-1)/maxThreads); // mSize is the size of array
unsigned int threadsNum = 1+((mSize-1)/blocksNum);
dim3 dimGrid(blocksNum, 1, 1);
dim3 dimBlock(threadsNum, 1, 1);
...
kernel<<<dimGrid,dimBlock>>>();
Is this form correct to call the kernel?
Thansk for your help.
Ok, I'm using the sum reduction kernel number 6 of Nvidia, and used the sample code, which determine the threads and blocks with the next code:
unsigned int threadsNum = (mSize < maxThreads*2) ? nextPow2((mSize + 1)/ 2) : maxThreads;
unsigned int blocksNum = (mSize + (threadsNum * 2 - 1)) / (threadsNum * 2);
This code works with my array.
Upvotes: 0
Views: 1703
Reputation: 6757
What you need is cudaFuncGetAttributes if you are using the CUDA runtime API or cuFuncGetAttribute with CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK as RoBiK pointed out in his answer if you are using the CUDA driver API. Both functions are documented in the Execution Control section of the respective API documentation.
Upvotes: 0
Reputation: 1730
You can use the Driver API to acces the properties of a particular kernel (called Function in the Driver API terminology).
Use the API call cuFuncGetAttribute with the CUfunction_attribute value equal to CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK.
This gives you:
The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.
Upvotes: 3
Reputation: 151799
Yes, the value (maxThreadsPerBlock) is one of the properties returned by cudaGetDeviceProperties. For a fully worked example take a look at the deviceQuery sample
Upvotes: 1