Reputation: 150
If there is device code structured as follows
Item* prev_entry = array[entry->prev];
prev_entry->next = entry->next;
And it were rewritten as an atomic operation
atomicExch(&(array[entry->prev]->next), entry->next);
is the memory access of array
done atomically along with the access of next
? There may be other threads that modify entry->prev
(as they may be another Item's next value) and if the array access is done non-atomically then entry->prev
may change between accessing the array and the execution of the atomic operation on the address next
resulting in an incorrect result.
To frame the question more generally, are all operations within an atomic operation's arguments executed atomically?
Upvotes: 0
Views: 227
Reputation: 152113
is the memory access of array done atomically along with the access of next?
No it is not. If you study the corresponding SASS code, you will discover that the read operation associated with entry->next
here:
atomicExch(&(array[entry->prev]->next), entry->next);
is an ordinary read operation, not protected in any way. That read operation puts the atomic "update value" in a register. Another register holds the address to update. The atomic operation works on those registers (returning its result in another register, if relevant).
Here is an example:
$ cat t1983.cu
__global__ void k(int *al, int *d){
atomicExch(al, d[threadIdx.x]);
}
$ nvcc -c t1983.cu
$ cuobjdump -sass ./t1983.o
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit
code for sm_52
Function : _Z1kPiS_
.headerflags @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
/* 0x001c7c00fe0007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ { MOV R2, c[0x0][0x140] ; /* 0x4c98078005070002 */
/*0018*/ S2R R4, SR_TID.X }
/* 0xf0c8000002170004 */
/* 0x001fc800fec20ff1 */
/*0028*/ SHR.U32 R0, R4.reuse, 0x1e ; /* 0x3828000001e70400 */
/*0030*/ ISCADD R4.CC, R4, c[0x0][0x148], 0x2 ; /* 0x4c18810005270404 */
/*0038*/ IADD.X R5, R0, c[0x0][0x14c] ; /* 0x4c10080005370005 */
/* 0x041fc400fe8007b1 */
/*0048*/ LDG.E R4, [R4] ; /* 0xeed4200000070404 */
/*0050*/ MOV R3, c[0x0][0x144] ; /* 0x4c98078005170003 */
/*0058*/ ATOM.E.EXCH RZ, [R2], R4 ; /* 0xed810000004702ff */
/* 0x001ffc00ffe007ed */
/*0068*/ NOP ; /* 0x50b0000000070f00 */
/*0070*/ EXIT ; /* 0xe30000000007000f */
/*0078*/ BRA 0x78 ; /* 0xe2400fffff87000f */
..........
First of all, we note that the atomic operation works purely based on registers:
ATOM.E.EXCH RZ, [R2], R4 ;
The register RZ is the "destination", it is RZ (the always-zero register, acting as a "discard" register) because we are not asking for the return value of the function. The register pair R2,R3 comprises the 64 bit address of the location to atomically "update" (in this case, replace its value), and the replacement value is contained in R4. Working backward we see that R4 was loaded here:
LDG.E R4, [R4] ;
which is completely separate from the atomic. That is an "ordinary" load. You can push backward to find out that the register pair R4,R5 which contains the load address is populated with the kernel argument from constant memory, offset using the SR_TID.X
register (corresponding to threadIdx.x
) which makes sense.
Likewise the R2,R3 register pair, containing the address of the atomic update location, is loaded directly from kernel arguments, with no offset, which also makes sense.
Upvotes: 1