user1281071
user1281071

Reputation: 895

Shared memory and constants

Is there any benefit from storing constant values in shared memory? for example:

A[tid] = CONSTANT * B[tid]

where A and B are arrays, CONSTANT is a constant value e.g. 4. and tid is thread index (array element = single thread).

Every thread has to read value CONSTANT, so shared memory should be useful, right?

How I think it works: Reading from global memory consumes a lot of time, so read the constant value once from global memory to shared memory and then the threads can read it fast. Since there are many threads (constant value has to be read many times) shared memory should speed it up.

Upvotes: 3

Views: 402

Answers (2)

djmj
djmj

Reputation: 5544

Constant memory space is cached and has a high performance in reading. So I doubt there will be much of a performance difference storing it in shared memory.

Upvotes: 1

Roger Dahl
Roger Dahl

Reputation: 15734

Some CPU instruction sets, such as x86, support storing full sized constants as operands interleaved with the opcodes themselves. In that case, the constants are obviously read in with the rest of the stream of instructions that the CPU is running and it seems unlikely that storing them anywhere else can be any faster.

Other architectures, such as ARM, support storing small constants and shift values within the opcodes. Most constants that are typically needed in a program can be represented as a small constant plus a shift value and can therefore be stored directly within the opcodes.

I don't know if SASS (the native instruction set for NVIDIA GPUs) supports such "embedded" constants.

Consider though, that if you store the constant in shared memory, you will need to reference that constant and the reference will itself be a constant or it will be derived from a constant (such as a base address).

Also, there is a cache for values that are designated as constants. You can take advantage of this cache by setting the values up in constant memory before calling the kernel.

Further, consider the overhead of setting the constant up in shared memory in the first place. Values in shared memory can only by shared within the threads in a block, so each block would have to set the constant up again. Because threads run in groups of 32, called warps, the kernel would tie up 32 threads in setting up the constant, each time processing started on a new block.

To conclude, I think it's best to just let the compiler handle single constants such as the one in your example, and to use constant memory for storing any constant arrays that you may have.

Upvotes: 4

Related Questions