CUDA. Shared Memory vs Constant

Question

I need a large amount of constant data, more than 6-8 KB, up to 16 KB. In the same time I don't use shared memory. And now I want to store this constant data in the shared memory. Is it a good idea? Any performance approximations? Does broadcasting work for shared memory as well as for constant?

Performance is critical for the application. And I think, I have only 8 KB constant memory cache on my Tesla C2075 (CUDA 2.0)

Roger Dahl · Accepted Answer

In compute capability 2.0, the same memory is used for L1 and shared memory. Partitioning between L1 and shared memory can be controlled with the cudaFuncSetCacheConfig() call. I would suggest setting L1 to the maximum possible (48K) with

cudaFuncSetCacheConfig(MyKernel, cudaFuncCachePreferL1);

Then, pull your constant data from global memory, and let L1 handle the caching. If you have multiple arrays that are const, you can direct the compiler to use the constant cache for some of them by using the const qualifier in the kernel argument list. That way, you can leverage both L1 and the constant cache to cache your constants.

Broadcasting works both for L1 and constant cache accesses.

CUDA. Shared Memory vs Constant

Answers (1)

Related Questions