Waldorf
Waldorf

Reputation: 883

Memory barrier on single core ARM

There is a lot of information related to memory barriers. Most info refers to multicore or multi processor architectures. Somewhere here on Stackoverflow is also stated that memory barriers are not required on single core processors.

So far I can not find any clear explanation why it should not be required on single core CPU's. Suppose a load and a store are reordered in thread A and a context switch occurs between both instructions. In this case thread B might react not as expected. Why would a context switch on a single core behave differently compared to 2 threads on different cores ? (except any cache coherency issues)

For example some infor from the ARM website:

"It is architecturally defined that software must perform a Data Memory Barrier (DMB) operation: •between acquiring a resource, for example, through locking a mutex (MUTual EXclusion) or decrementing a semaphore, and making any access to that resource •before making a resource available, for example, through unlocking a mutex or incrementing a semaphore"

This sounds very clear, however in the provided example they refer explicitly to a multi core configuration.

Upvotes: 6

Views: 4519

Answers (2)

user1751957
user1751957

Reputation: 11

The CPU only re-orders instructions that have been already "issued", so a context switch will not halt any of those instructions already in the pipeline, they will continue to execute until complete.

It is unlikely that by the time the context switch has completed any of those instructions remain to be completed. A context switch typically saves the state of all registers, thereby creating a dependance on every instruction that is modifying a register to complete first.

However even for the unlikely situation of re-ordered instructions still executing past the context switch, possibly a memory store, the CPU ensures that it gives the appearance to the software that the instructions execute in the correct order. So as the second thread tries to access the shared data, the CPU will ensure that the necessary instructions have completed before allowing dependent instructions to execute.

The multi-core situation is really a case of maintaining the ordering of writes to the cache/memory so that the other core(s) see the changes occur in the right order. A memory barrier is only required for this.

Upvotes: 1

artless-noise-bye-due2AI
artless-noise-bye-due2AI

Reputation: 22395

Why would a context switch on a single core behave differently compared to 2 threads on different cores ? (except any cache coherency issues)

The threads on separate cores may act at exactly the same time. You still have issues on a single core.

Somewhere here on Stackoverflow is also stated that memory barriers are not required on single core processors.

This information maybe taken out of context (or not provide enough context).


Wikipedia's Memory barrier and Memory ordering pages have sections Out-of-order execution versus compiler reordering optimizations and Compile time/Run time ordering. There are many places in a pipeline where the ordering of memory may matter. In some cases, this may be taken care of by the compiler, by the OS, or by our own code.

Compiler memory barriers apply to a single CPU. They are especially useful with hardware where the ordering and timing of writes and reads matter.

Linux defines some more types of memory barriers,

  1. Write/Store.
  2. Data dependency.
  3. Read/Load.
  4. General memory barriers.

Mainly these map fairly well to DMB (DSB and IMB are more for code modification).

The more advances ARM CPUs have multiple load/store units. In theory some non-preemptive threading switch Note1 (especially with aliased memory) could cause some issue with a multi-threaded single CPU application. However, it would be fairly hard to construct this case.

For the most part, good memory ordering is handled by the CPU by scheduling instructions. A common case where it does matter with a single CPU is for system level programmers altering CP15 registers. For instance, an ISB should be issued when turning on the MMU. The same may be true for certain hardware/device registers. Finally, a program loader will need barriers as well as cache operations, even on single CPU systems.

UnixSmurf wrote these blogs on memory access ordering,

The topic is complex and you have to be specific about the types of barriers you are discussing.

Note1: I say non preemptive as if an interrupt occurs, the single CPU will probably ensure that all outstanding memory requests are complete. With a non preemptive switch, you do something like longjmp to change threads. In theory, you could change contexts before all writes had completed. The system would only need a DMB in the yield() to avoid it.

Upvotes: 6

Related Questions