pylua
pylua

Reputation: 635

Programmatically retrieve maximum number of blocks per multiprocessor

Is there a way to programmatically retrieve the maximum number of blocks that can fit on a multiprocessor? I understand that if I want to reach the maximum number of blocks per multiprocessor, then I need to figure out how many threads and how much shared memory I can use without constraining the number of blocks. (Oh, and the maximum number of registers)

But, I am looking at

http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/structcudaDeviceProp.html

and I do not see a way to programmatically retrieve the maximum number of blocks per multiprocessor.

Is there a way to do this?

Upvotes: 4

Views: 856

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151799

As far as I know there is no API function to retrieve this number directly.

You could create your own function by retrieving the compute capability major version and using the info in the programming guide. If cc 1.x or 2.x, it's 8 blocks per multiprocessor. If cc 3.x it's 16 blocks per multiprocessor. If cc 5.x it's 32 blocks.

That doesn't future-proof your code, but it may be about the best possible method.

Upvotes: 3

Related Questions