Can you programmatically know the max blocks and threads per block in a GPU?

Question

I am writing a CUDA program that will probably run on many different GPUs. I would like to know if CUDA provides some way of reading from code (either runtime or compile time) the capabilities of the current GPU, meaning the number of threads a single block can contain, and the maximum number of blocks, so I can tailor the launch of the kernel to optimally use all the resources.

I know it may sound like a silly question but I can't find any answers online.

Bonus question if it is not possible: I see here that someone says they know the Jetson TX1 has

2 SM’s - each with 128 cores. I read that per SM (which I understand there are 2) there can be a maximum of 16 active blocks, and 64 active warps (or 2048 active threads).

How can I find this info for a given GPU?

joni · Accepted Answer

I guess cudaGetDeviceProperties seems to be what you are looking for.

Can you programmatically know the max blocks and threads per block in a GPU?

Answers (1)

Related Questions