Reputation: 3598
I have researching the cost of volatile writes in Java in x86 hardware. I'm planning on using the Unsafe's putLongVolatile method on a shared memory location. Looking into the implementation, putLongVolatile get's translated to Unsafe_SetLongVolatile in Link and subsequently into a AtomicWrite followed by an fence Link
In short, every volatile write is converted to an atomic write followed by a full fence(mfence or locked add instruction in x86).
Questions:
1) Why a fence() is required for x86 ? Isn't a simple compiler barrier sufficient because of store-store ordering ? A full fence seems awfully expensive.
2) Is putLong instead of putLongVolatile of Unsafe a better alternative? Would it work well in multi-threading case?
Upvotes: 2
Views: 159
Reputation: 11307
Answer to question 1:
Without the full fence you do not have sequential consistency which is required for the JMM.
So X86 provides TSO. So the following barriers you get for free [LoadLoad][LoadStore][StoreStore]. The only one missing is the [StoreLoad].
A load has acquire semantics
r1=X
[LoadLoad]
[LoadStore]
A store has release semantics
[LoadStore]
[StoreStore]
Y=r2
If you would do a store followed by a load you end up with this:
[LoadStore]
[StoreStore]
Y=r2
r1=X
[LoadLoad]
[LoadStore]
The issue is that the load and store can still be reordered and hence it isn't sequential consistent; and this is mandatory for the Java Memory model. They only way to prevent this is with a [StoreLoad]. And the most logical place would be to add it to the write since normally reads are more frequent than writes.
And this can be accomplished by an MFENCE
or a lock addl %(RSP),0
Answer to question 2:
The problem with a putLong is that not only the CPU can reorder instructions, also the compiler could change to code in such a way that it leads to instruction reordering.
Example: if you would be doing a putLong in a loop, the compiler could decide to pull the write out of the loop and the value will not become visible to other threads. If you want to have a low overhead single writer performance counter, you might want to have a look at putLongRelease/putLongOrdered(oldname). This will prevent the compiler from doing the above trick. And the release semantics on the X86 you get for free.
But it is very difficult to give a one fits all solution to your second question because it depends on what your goal is.
Upvotes: 2