Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

Question

I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block instantly anyways, it might be good to instruct the compiler to have the shared memory volatile.
So I changed

__global__ void CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

to

__global__ void volShared_CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    volatile __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

Below compile-time error happens having above change:

error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (volatile int *, int)

Why isn't a volatile address supported as an argument for atomic operations? Is it because compiler already treats the shared memory as volatile as soon as it identifies that there's going to be atomic operations on it?

Robert Crovella · Accepted Answer

The definition of the volatile qualifier is given in the programming guide. It instructs the compiler to always generate a read or write for that access, and never "optimize" it into a register or some other optimization.

Since atomic operations are guaranteed to act on actual memory locations (either shared or global) the combination of the two is unnecessary. Therefore, versions of atomic functions prototyped for volatile qualifier are not provided.

If you have a memory location that is already declared as volatile, simply cast it to the corresponding non-volatile type when you pass the address to your atomic function. The behavior will be as expected.(example)

Therefore, atomic operations can operate on locations specified as volatile with this proviso.

The simple fact that you have accessed a particular location using atomics somewhere in your code does not mean that the compiler will treat every access elsewhere as implicitly volatile. If you need volatile behavior elsewhere, declare it explicitly.

Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

Answers (2)

Related Questions

Why using &quot;volatile&quot; keyword for shared memory is not possible when atomic operations are done on shared memory?

Answers (2)

Related Questions

Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?