Some Name
Some Name

Reputation: 9521

Java volatile memory ordering and its compilation on x86-64

Consider the following simple Java application:

public class Main {
    public int a;
    public volatile int b;

    public void thread1(){
        int b;
        a = 1;
        b = this.b;
    }

    public void thread2(){
        int a;
        b = 1;
        a = this.a;
    }

    public static void main(String[] args) throws Exception {
        Main m = new Main();
        while(true){
            m.a = 0;
            m.b = 0;
            Thread t1 = new Thread(() -> m.thread1());
            Thread t2 = new Thread(() -> m.thread2());
            t1.start();
            t2.start();
            t1.join();
            t2.join();
        }
    }
}

QUESTION: Is it possible that reading into local variables will result in thread1::b = 0 and thread2::a = 0?

I could not prove that it could not happen from the JMM standpoint so I went down to analyzing compiled code for x86-64.

Here is what compiler ends up with for methods thread1 and thread2 (code unrelated to the while loop and some comments generated by -XX:+PrintAssembly omitted for simplicity):

thread1:

  0x00007fb030dca235: movl    $0x1,0xc(%rsi)    ;*putfield a
  0x00007fb030dca23c: mov     0x10(%rsi),%esi   ;*getfield b

thread2:

  0x00007fb030dcc1b4: mov     $0x1,%edi
  0x00007fb030dcc1b9: mov     %edi,0x10(%rsi)
  0x00007fb030dcc1bc: lock addl $0x0,0xffffffffffffffc0(%rsp) ;*putfield b 
  0x00007fb030dcc1c2: mov     0xc(%rsi),%esi    ;*getfield a

So what we have here is that volatile read is done for free, volatile write requires mfence (or lock add) after.

So thread1's Store can still be forwarded after the Load and therefore thread1::b = 0 and thread2::a = 0 is possible.

Upvotes: 4

Views: 317

Answers (2)

pveentjer
pveentjer

Reputation: 11307

Analyzing how the compiled code works is great if you want to know what happens at the hardware level and @petercordess has given a great answer (as usual).

But if you want to reason about the correctness of a program, you need to reason in terms of the Java Memory Model. Fences are not a replacement for the JMM and fences only show you what happens at the hardware level; but not what happens at the compiler.

public class Main {
    public int a;
    public volatile int b;

    public void thread1(){
        a = 1;                  (1)
        int r1 = b;             (2)
    }

    public void thread2(){
        b = 1;                   (3)
        int r2 = a;              (4)
    }

There is a data race because if you would have the execution (1),(2),(3),(4) then r2 could be 1, but there is no happens-before edge since a (2),(3) doesn't provide one. There is no happens-before edge between a read followed by a write of the same volaile variable.

Since some executions can have a data race, r2 is allowed to read the initial value or a write it is in data race with; so a concurrent write. So that gives the values 0 and 1.

The JMM is happens-before consistent; so a read needs to see the most recent write before it in the happens-before order or a write it is racing with. This will prevent undefined behavior as you can get with e.g. C++ where any value can be returned.

Upvotes: 3

Peter Cordes
Peter Cordes

Reputation: 364039

Yeah, your analysis looks right. This is the StoreLoad litmus test with only one of the sides having a StoreLoad barrier (like C++ std::atomic iwth memory_order_seq_cst, or Java volatile). It's needed in both to shut down this possibility. See Jeff Preshing's Memory Reordering Caught in the Act for details on the case where neither side has such a barrier.

StoreLoad reordering of a=1 with b=this.b allows an effective order of

   thread1        thread2
                  b=this.b        // reads 0
    b=1
    a=this.a                      // reads 0
                  a=1

(This mess of names is why it's normal for examples and reordering litmus tests to pick names like r0 and r1 for "registers" to talk about load results that threads observed, definitely not the same names as the shared variables which make the meaning of a statement context-sensitive and a pain to look at and think about in a reordering diagram.)

So thread1's Store can still be forwarded after the Load and therefore thread1::b = 0 and thread2::a = 0 is possible.

It seems you mean "reordered after", not forwarded. "Forwarding" in a memory-ordering context would mean store-to-load forwarding (where a load pulls data from the store buffer before it becomes globally visible, so it sees its own stores right away, in a different order relative to other things than other threads would). But neither of your threads is reloading its own stores, so that's not happening.

x86's memory model is basically program-order + a store buffer with store-to-load forwarding, so StoreLoad reordering is the only kind that can happen.

So yes, this is the closest you can come to ruling out ra=rb=0 while still leaving a window open for it to happen. Running on a strongly-ordered ISA (x86), with a barrier in one side.

It's also going to be really unlikely to observe when you only make one test per thread startup; not surprised it took you 30 minutes for these executions to happen at close enough to the same time across cores to observe this. (Testing faster is non-trivial, like a 3rd thread that resets things between tests and wakes both other threads? But doing something to make it more likely that both threads reach this code at the same time could help a lot, like maybe having them both spin-wait for the same variable, so they'd likely wake within a hundred cycles of each other.)

Upvotes: 7

Related Questions