Yang Liu
Yang Liu

Reputation: 95

Is Threadfence Needed for Cuda Volatile Variables?

Volatile force each shared/global memory write/read goes directly to shared/global memory. Does this automatically accomplish what threadfenced does? For example:

volatile __shared__ int s;

s = 2;
s = 10

Then no need of threadfence between "s = 2" and "s = 10"?

Can we say that for a volatile variable, threadfence is not needed? If not, any example?

Upvotes: 0

Views: 613

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151869

For a volatile variable in shared memory defined like this:

volatile __shared__ int s;

any access by other threads in the threadblock after the execution of the following line:

s = 2;

will see s as containing 2, assuming there are no further updates to s. However volatile does not cause any sort of barrier. __threadfence() and its derivatives are execution barriers. The thread in question will not proceed beyond that barrier until it is guaranteed that updates to shared memory and global memory (for __threadfence()) are visible to other threads.

However, with the following sequence:

s = 2;
s = 10;

There is no guarantee what other threads will see (except in the warp synchronous case, and subject to further scenario description which you have not provided), except that they will see either 2 or 10 (and again, assuming there are no further updates to s).

Upvotes: 1

Related Questions