user2093311
user2093311

Reputation: 53

Determine Max Threads dynamically in CUDA?

is possible to determine the max number of threads per block dynamically? i.e. a function to ask to the GPU the value and store it in a variable. Thanks for your help.


Thanks, I determined the max number of threads with the following code:

int dev = 0;
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);

unsigned int maxThreads = deviceProp.maxThreadsPerBlock;

and with this number I calculate blocks and threads for my kernel with this lines:

unsigned int blocksNum = 1+((mSize-1)/maxThreads); // mSize is the size of array
unsigned int threadsNum = 1+((mSize-1)/blocksNum);
dim3 dimGrid(blocksNum, 1, 1);
dim3 dimBlock(threadsNum, 1, 1);
...
kernel<<<dimGrid,dimBlock>>>();

Is this form correct to call the kernel?

Thansk for your help.


Ok, I'm using the sum reduction kernel number 6 of Nvidia, and used the sample code, which determine the threads and blocks with the next code:

unsigned int threadsNum = (mSize < maxThreads*2) ? nextPow2((mSize + 1)/ 2) : maxThreads;
unsigned int blocksNum = (mSize + (threadsNum * 2 - 1)) / (threadsNum * 2);

This code works with my array.

Upvotes: 0

Views: 1703

Answers (4)

Joe
Joe

Reputation: 6757

What you need is cudaFuncGetAttributes if you are using the CUDA runtime API or cuFuncGetAttribute with CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK as RoBiK pointed out in his answer if you are using the CUDA driver API. Both functions are documented in the Execution Control section of the respective API documentation.

Upvotes: 0

RoBiK
RoBiK

Reputation: 1730

You can use the Driver API to acces the properties of a particular kernel (called Function in the Driver API terminology).

Use the API call cuFuncGetAttribute with the CUfunction_attribute value equal to CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK.

This gives you:

The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.

Upvotes: 3

Tom
Tom

Reputation: 21108

Query the device properties, look at maxThreadsPerBlock.

Upvotes: 3

Robert Crovella
Robert Crovella

Reputation: 151799

Yes, the value (maxThreadsPerBlock) is one of the properties returned by cudaGetDeviceProperties. For a fully worked example take a look at the deviceQuery sample

Upvotes: 1

Related Questions