How is this a guarantee a value has been atomically updated in ARM?

ARM provides LDREX/STREX to atomically load/store values, but I feel like I'm missing something in how this is still an atomic operation. The following below is generally how an increment by one would be done. However, what's preventing something from preempting during the ADD instruction, thereby making it so that r2 no longer matches what is stored in [r0]?

(Assuming r0 is valid and r1 = 1)

ADD
    LDREX r2, [r0]
    ADDS  r2, r2, #1
    STREXNE r2, r1, [r0]        @ Store 1 if the original [r0] was not -1
    CMPNE r2, #1
    BEQ ADD

Upvotes: 2

Answers (1)

old_timer

Reputation: 71516

The ldrex/strex work based on the logic keeping track of exclusive accesses relative to a process id, both of which are presented on the bus at the time.

so if there is an access between the ldrex and strex

ldrex process x
strex process x

due to interrupt or other, the logic is supposed to return a not okay and the strex returns:

1 If the operation fails to update memory.

as documented.

Now the gray area here is multi-fold. The arm logic itself (caches made by arm the l1 and if you bought an l2 from them) will support exclusive access. At one time the arm documentation and it may still be there, if this is a uniprocessor (only one core implemented) you do not have to support exclusive access. And you may find that the non-support simply returns an EXOKAY instead of OKAY on the bus (success vs fail) instead of actually keeping track. But you have to get that access to miss the layers of caching, which means they are off which pretty much means you are not running an os as it is a pain to disable or not enable the cache.

The hardware folks are/were told that you do not have to support exclusive access for uniprocessors. And the general population that ldrex/strex are NOT a replacement for SWP (which is still present in a number of cores). That ldrex/strex are specifically for multiple cores to share resources, it is to allow the different cores to talk to each other basically and share resources, it is not for one core to compete with itself.

The software folks were told in places that they are a replacement for SWP. Also you have the problem of the process id, if uniprocessor do you have different IDs on these transactions? If so how and when did you set those ids? Even if the hardware is implemented to properly support exclusive access, and multi-processor, if your two threads share the same id, or all the threads on that core share the same id then they will interfere with each other. This should be trivial to test though with an experiment.

The software in particular Linux community is focused on it being a replacement for swp, which made it hard for the one/few vendors that read the you don't have to support it and that made Linux not work. At the same time there are a disturbing number of bugs in the Linux kernel related to arm in particular, it takes a lot of work to port each new release as so many improperly done errata and other workarounds are placed. And I suspect many people porting Linux are not aware of the bugs they are creating and or leaving in their ports.

In short the theory is that each thread has its own process id and the logic is keeping track of accesses to the addresses in question and the process id, and if there is an access in between one processes ldrex and strex, then the strex will fail and you have to start over with another ldrex, this is why it is in a loop.

ldrex id x
...
strex id x  (passes)


ldrex id x
...
ldrex id y
...
strex id x (pass)
...
strex id y (fail)


ldrex id x
... 
ldrex id y
...
strex id y (pass)
...
strex id x (fail)

and so on.

Obviously the logic cannot store history for an infinite number of addresses and process ids, so naturally if the ...

ldrex id x
...
strex id x

has a ton of accesses in between. Then you can expect a failure from time to time.

Also note that I think one or more of the cortex-ms does not support ldrex/strex in the arm logic.

Well, okay, there is this language for example:

The Cortex-M3 processor implements a local exclusive monitor. The local monitor within the processor has been constructed so that it does not hold any physical address, but instead treats any access as matching the address of the previous LDREX. This means that the implemented exclusives reservation granule is the entire memory address range.

Which I also see in other cortex-ms.

more text from the documentation to ponder.

The Load-Exclusive instruction always successfully reads a value from memory address x.

The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other processor or process has performed a more recent store of address x. The Store-Exclusive operation returns a status bit that indicates whether the memory write succeeded.

For memory regions that do not have the shareable attribute, the exclusive access instructions rely on a local monitor that tags any address from which the processor executes a Load-Exclusive. Any non-aborted attempt by the same processor to use a Store-Exclusive to modify any address is guaranteed to clear the tag.

Notice how processor and/or process is used and not a term like thread. Also note the comment about store exclusive and not store in general. So while experimenting you should also:

ldrex
...
str
...
strex

and see what happens.

Upvotes: 2

How is this a guarantee a value has been atomically updated in ARM?

Answers (1)

Related Questions