einpoklum
einpoklum

Reputation: 131544

How can I launch a kernel with "as much dynamic shared mem as is possible"?

We know CUDA devices have very limited shared memory capacities, in the tens of Kilobytes only. And we also know kernels won't launch (typically? ever?) If you ask for too much shared memory. And we also know that the available shared memory is used both by the static allocations in code that you use and the dynamically-allocated shared memory.

Now, cudaGetDeviceProperties() gives us the overall space we have. But, given a function symbol, is it possible to determine how much statically-allocated shared memory it would use, so that I can "fill up" the shared mem to full capacity on launch? If not, is there a possibility of having CUDA take care of this for me somehow?

Upvotes: 0

Views: 74

Answers (2)

Dongwei Wang
Dongwei Wang

Reputation: 495

you also can use nvcc compilation information to get the statically allocation of shared memory

Upvotes: -1

talonmies
talonmies

Reputation: 72349

The runtime API has a function cudaFuncGetAttributes which will allow you to retrieve the attributes of any kernel in the current context, including the amount of static shared memory per block which the kernel will consume. You can do the math yourself with that information.

Upvotes: 2

Related Questions