IUnknown
IUnknown

Reputation: 9809

Clarification on single writer

I have a couple of clarifications on the statements below(source - https://mechanical-sympathy.blogspot.in/2011/09/single-writer-principle.html):

'x86/x64 have a memory model, whereby load/store memory operations have preserved order, thus memory barriers are not required if you adhere strictly to the single writer principle.

On x86/x64 "loads can be re-ordered with older stores" according to the memory model so memory barriers are required when multiple threads mutate the same data across cores. '

Does this mean that :
1. Within a single core , load/store memory operations are always in order?
So a single writer thread (and multiple reader threads ) on a single core system would not need to 'synchronize' to resolve visibility issues?
2. For multiple cores, loads can be re-ordered with stores initiated from other cores ?
So a single writer thread ( and multiple reader threads running on other cores ) would not need to 'synchronize' to resolve visibility issues (as there wont be any stores)?

So,if we strictly maintain a single writer - we can actually do away with the practice of using 'synchronized' on both reads and writes in primitive locks. We can actually do away with 'synchronized' completely?

Upvotes: 3

Views: 836

Answers (2)

Eugene
Eugene

Reputation: 120848

Big disclaimer

Some of the things that I have written here were actually tested - like re-ordering, flushing, etc; some of them took a lot of time reading, I hope I got them right.

Everything is re-ordered, the strategy of not re-ordering and letting your program run the way it is was dropped years ago. As long as the output does not change the operations are re-ordered as they please.

For example:

 static int sum(int x, int y){
     x = x + 1;
     y = y + 1;
     return x + y;
 }

You don't really care the order in which these operations are done as long as the result is correct, do you?

Without memory barriers (often called StoreLoad|StoreStore|LoadStore|LoadLoad) any operation can change. In order to guarantee that some operations do not move beyond a fence, there are cpu fences implemented. Java has few ways to generated them - volatile, synchroniztion, Unsafe/VarHandle (there might be others, I don't know).

Basically when you write to a volatile for example, this happens:

volatile x...

[StoreStore] - inserted by the compiler
[LoadStore]
x  = 1; // volatile store
[StoreLoad] 

...

[StoreLoad]
int t = x; // volatile load
[LoadLoad]
[LoadStore]

Let's take a subset of that example:

[StoreStore]
[LoadStore]
x = 1; // volatile store

This means that any Store or Load of the variable can not be re-ordered with x = 1. Same principle is applied to the other barriers.

What Martin Thomson says is that on x86 3 out of 4 barrier are FREE, the only one that has be be issued is : StoreLoad. They are free because x86 has a strong memory model meaning that the other operations are not re-ordered by default. On other cpu's, some of these operations are quite cheap too (if I'm mistaken on ARM there's lwsync - lightweight sync; name should be self explanatory).

Also, there's a little buffer between CPU and cache - called Store Buffer. When you write something to a variable, it does not go directly to the cache(s). It goes to that buffer. When it is full (or is forced to be drained via StoreLoad) it puts the writes to caches - and it's up to the cache coherency protocol to sync data in all caches.

What Martin is saying that if you have multiple writers you have to issue a StoreLoad many times - thus it is expensive. If you have a single writer you don't have to. The buffer will be drained when it is full. When that happens? Well sometimes, in theory could be never, in practice quite fast.

Some awesome resources (these sometimes kept me all night without sleeping, so watch out!):

These a StoreStore btw every time you write to a final variable inside a constructor:

 private final int i ;

 public MyObj(int i){
     this.i = i;
      // StoreStore here 
}

LazySet

Shipilev Volatile

And my all time favorite!

Upvotes: 2

Holger
Holger

Reputation: 298143

Within a single core, it doesn’t matter whether the memory access is done in-order or out-of-order. If there is a single core, it will always perceive consistent values, as read requests will be served by the same cache holding not-yet-written data.

However, that is irrelevant for Java programs, as the Java Memory Model, part of the Java Language Specification, doesn’t make such guarantees for multi-threaded programs. In fact, the term “memory barrier” doesn’t appear within the specification at all.

What you have to realize, is, that the Java code you have written, will not be the x86/x64 code the CPU will execute. The optimized native code won’t be anything like your source code. One fundamental part of the code optimization is to eliminate redundant reads and writes or even conditional code parts, under the assumption that values do not spuriously change in-between, which is always correct for single threaded execution.

This optimized code will produce inconsistent results, if the underlying assumptions are invalidated due to multi-threaded manipulation without proper thread safe constructs. This is an accepted inconsistency, within the specification, as a memory model enforcing consistent results at all costs would result in dramatically poor performance. The thread safe constructs, like synchronization or volatile writes and reads, do not only tell the JVM where to insert memory barriers, if the underlying architecture requires it, but also, where and how to restrict the code optimizations.

This is the reason why a) proper thread safe constructs are needed when manipulating mutable shared state and b) these constructs may have performance penalties, even if there are no memory barriers needed at the CPU/hardware level.

Upvotes: 3

Related Questions