user3134920
user3134920

Reputation: 31

For which sizes are plain loads and store to global memory in CUDA atomic?

Are general reads and writes to global memory atomic in CUDA if:

Are at least on Kepler and Fermi general 4 byte reads and writes to global memory atomic on Warp level or 8/16 Byte instructions atomic on half/quater Warp level if:

If any of those assumptions about the atomicness on warp level is correct, is there any method of harnessing this knowledge without risking the compability to future Compute Capabilites?

Upvotes: 3

Views: 585

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151809

Reads and writes generally take place with respect to the caches. By the time the transactions are issued to global memory, there is no guarantee of atomicity in the CUDA programming or memory model, unless atomic instructions are used.

For example, suppose a thread in a threadblock updates a 4-byte quantity in L2 on Kepler. Now, another thread, in another warp, threadblock, or kernel could update just one of those 4 bytes, in the L2, before that cacheline gets evicted to global memory. By the time the cacheline gets evicted to global memory, it may not represent what was written either by the original thread or even the second thread (for example if a third write came along...).

Keep in mind the L2 is a write-back cache, cannot be disabled, and is not bypassed by global reads and writes, except in the case of atomic instructions.

Upvotes: 3

Related Questions