RootPhoenix
RootPhoenix

Reputation: 1747

Why we do not use barriers in User space

I am reading about memory barriers and what I can summarize is that they prevent instruction re-ordering done by compilers.

So in User space memory lets say I have

b = 0;
main(){

a = 10;
b = 20;
c = add(a,b);

}

Can the compiler reorder this code so that b = 20 assignment happens after c = add() is called.

Why we do not use barriers in this case ? Am I missing some fundamental here.

Does Virtual memory is exempted from any re ordering ?

Extending the Question further:

In Network driver:

1742         /*
1743          * Writing to TxStatus triggers a DMA transfer of the data
1744          * copied to tp->tx_buf[entry] above. Use a memory barrier
1745          * to make sure that the device sees the updated data.
1746          */
1747         wmb();
1748         RTL_W32_F (TxStatus0 + (entry * sizeof (u32)),
1749                    tp->tx_flag | max(len, (unsigned int)ETH_ZLEN));
1750 

When he says devices see the updated data... How to relate this with the multi threaded theory for usage of barriers.

Upvotes: 3

Views: 1932

Answers (3)

Gabriel Southern
Gabriel Southern

Reputation: 10063

Short answer

Memory barriers are used less frequently in user mode code than kernel mode code because user mode code tends to use higher level abstractions (for example pthread synchronization operations).

Additional details

There are two things to consider when analyzing the possible ordering of operations:

  1. What order the thread that is executing the code will see the operations in
  2. What order other threads will see the operations in

In your example the compiler cannot reorder b=20 to occur after c=add(a,b) because the c=add(a,b) operation uses the results of b=20. However, it may be possible for the compiler to reorder these operations so that other threads see the memory location associated with c change before the memory location associated with b changes.

Whether this would actually happen or not depends on the memory consistency model that is implemented by the hardware.

As for when the compiler might do reordering you could imagine adding another variable as follows:

b = 0;
main(){

a = 10;
b = 20;
d = 30;
c = add(a,b);

}

In this case the compiler would be free to move the d=30 assignment to occur after c=add(a,b).

However, this entire example is too simplistic. The program doesn't do anything and the compiler can eliminate all the operations and does not need to write anything to memory.

Addendum: Memory reordering example

In a multiprocessor environment multiple threads can see memory operations occur in different orders. The Intel Software Developer's Manual has some examples in Volume 3 section 8.2.3. I've copied a screenshot below that shows an example where loads and stores can be reordered. There is also a good blog post that provides some more detail about this example.

Loads reordered with earlier store to different locations

Upvotes: 4

Erik
Erik

Reputation: 2051

The compiler cannot reorder (nor can the runtime or the cpu) so that b=20 is after the c=add()since that would change the semantics of the method and that is not permissible. I would say that for the compiler (or runtime or cpu) to act as you describe would make the behaviour random, which would be a bad thing.

This restriction on reordering applies only within the thread executing the code. As @GabrielSouthern points out, the ordering of the stores becoming globally visible is not guaranteed, if a, b, and c are all global variables.

Upvotes: 1

Peter Cordes
Peter Cordes

Reputation: 364039

The thread running the code will always act as if the effects of the source lines of its own code happened in program order. This is as if rule is what enables most compiler optimizations.

Within a single thread, out-of-order CPUs track dependencies to give a thread the illusion that all its instructions executed in program order. The globally-visible (to threads on other cores) effects may be seen out-of-order by other cores, though.

Memory barriers (as part of locking, or on their own) are only needed in code that interacts with other threads through shared memory.

Compilers can similarly do any reordering / hoisting they want, as long as the results are the same. The C++ memory model is very weak, so compile-time reordering is possible even when targeting an x86 CPU. (But of course not reordering that produces different results within the local thread.) C11 <stdatomic.h> and the equivalent C++11 std::atomic are the best way to tell the compiler about any ordering requirements you have for the global visibility of operations. On x86, this usually just results in putting store instructions in source order, but the default memory_order_seq_cst needs an MFENCE on each store to prevent StoreLoad reordering for full sequential consistency.

In kernel code, memory barriers are also common to make sure that stores to memory-mapped I/O registers happen in a required order. The reasoning is the same: to order the globally-visible effects on memory of a sequence of stores and loads. The difference is that the observer is an I/O device, not a thread on another CPU. The fact that cores interact with each other through a cache coherency protocol is irrelevant.

Upvotes: 2

Related Questions