Shared memory and constants

Question

Is there any benefit from storing constant values in shared memory? for example:

A[tid] = CONSTANT * B[tid]

where A and B are arrays, CONSTANT is a constant value e.g. 4. and tid is thread index (array element = single thread).

Every thread has to read value CONSTANT, so shared memory should be useful, right?

How I think it works: Reading from global memory consumes a lot of time, so read the constant value once from global memory to shared memory and then the threads can read it fast. Since there are many threads (constant value has to be read many times) shared memory should speed it up.

Roger Dahl · Accepted Answer

Some CPU instruction sets, such as x86, support storing full sized constants as operands interleaved with the opcodes themselves. In that case, the constants are obviously read in with the rest of the stream of instructions that the CPU is running and it seems unlikely that storing them anywhere else can be any faster.

Other architectures, such as ARM, support storing small constants and shift values within the opcodes. Most constants that are typically needed in a program can be represented as a small constant plus a shift value and can therefore be stored directly within the opcodes.

I don't know if SASS (the native instruction set for NVIDIA GPUs) supports such "embedded" constants.

Consider though, that if you store the constant in shared memory, you will need to reference that constant and the reference will itself be a constant or it will be derived from a constant (such as a base address).

Also, there is a cache for values that are designated as constants. You can take advantage of this cache by setting the values up in constant memory before calling the kernel.

Further, consider the overhead of setting the constant up in shared memory in the first place. Values in shared memory can only by shared within the threads in a block, so each block would have to set the constant up again. Because threads run in groups of 32, called warps, the kernel would tie up 32 threads in setting up the constant, each time processing started on a new block.

To conclude, I think it's best to just let the compiler handle single constants such as the one in your example, and to use constant memory for storing any constant arrays that you may have.

Shared memory and constants

Answers (2)

Related Questions