Dale
Dale

Reputation: 25

OpenCL __constant vs #define

I have quite a number of constants that govern memory allocations, number of loop iterations, etc. in my OpenCL kernel. Is it faster to use global __constants or #defines?

Upvotes: 2

Views: 6854

Answers (2)

Megharaj
Megharaj

Reputation: 1619

define works the same way as in C. An exception to this is all versions before AMD APP SDK v2.8 (without OpenCL 1.2 support).

__Constant is the cahched memory space. Please do read more information on memory layout in OpenCL.

__global is the total memory of the GPU, visible for all threads.

__local is the local memory of the GPU, visible by only threads inside the block.

__constant is the cached memory which is much faster then global but limited, so use it only where required.

__private is the private memory of the GPU, visible by each individual threads only.

Note: Threads, I mean processing elements.

Upvotes: 1

matthias
matthias

Reputation: 2181

The same rules as for a "normal" C compiler apply to an OpenCL compiler: A #define is replaced with the value before actual compilation, thus they are baked into the kernel.

Per definition, a __constant variable is allocated in the global memory and must be transferred before use. This is slower than using a #defined literal. However, the GPU architectures from NVIDIA and AMD cache these values and are faster to read than ordinary global memory.

End of story and my personal advice: Use #defines for constant values as well as "magic" numbers and __constant memory for larger fast but read-only memory blocks (e.g. lookup tables).

Upvotes: 5

Related Questions