Reputation: 25
I have quite a number of constants that govern memory allocations, number of loop iterations, etc. in my OpenCL kernel. Is it faster to use global __constants or #defines?
Upvotes: 2
Views: 6854
Reputation: 1619
define works the same way as in C. An exception to this is all versions before AMD APP SDK v2.8 (without OpenCL 1.2 support).
__Constant is the cahched memory space. Please do read more information on memory layout in OpenCL.
__global is the total memory of the GPU, visible for all threads.
__local is the local memory of the GPU, visible by only threads inside the block.
__constant is the cached memory which is much faster then global but limited, so use it only where required.
__private is the private memory of the GPU, visible by each individual threads only.
Note: Threads, I mean processing elements.
Upvotes: 1
Reputation: 2181
The same rules as for a "normal" C compiler apply to an OpenCL compiler: A #define
is replaced with the value before actual compilation, thus they are baked into the kernel.
Per definition, a __constant
variable is allocated in the global memory and must be transferred before use. This is slower than using a #define
d literal. However, the GPU architectures from NVIDIA and AMD cache these values and are faster to read than ordinary global memory.
End of story and my personal advice: Use #defines
for constant values as well as "magic" numbers and __constant
memory for larger fast but read-only memory blocks (e.g. lookup tables).
Upvotes: 5