Reputation: 3598

Cost of volatile writes

I have researching the cost of volatile writes in Java in x86 hardware. I'm planning on using the Unsafe's putLongVolatile method on a shared memory location. Looking into the implementation, putLongVolatile get's translated to Unsafe_SetLongVolatile in Link and subsequently into a AtomicWrite followed by an fence Link

In short, every volatile write is converted to an atomic write followed by a full fence(mfence or locked add instruction in x86).

Questions:

1) Why a fence() is required for x86 ? Isn't a simple compiler barrier sufficient because of store-store ordering ? A full fence seems awfully expensive.

2) Is putLong instead of putLongVolatile of Unsafe a better alternative? Would it work well in multi-threading case?

Upvotes: 2

Answers (1)

pveentjer

Reputation: 11307

Answer to question 1:

Without the full fence you do not have sequential consistency which is required for the JMM.

So X86 provides TSO. So the following barriers you get for free [LoadLoad][LoadStore][StoreStore]. The only one missing is the [StoreLoad].

A load has acquire semantics

r1=X
[LoadLoad]
[LoadStore]

A store has release semantics

[LoadStore]
[StoreStore]
Y=r2

If you would do a store followed by a load you end up with this:

[LoadStore]
[StoreStore]
Y=r2
r1=X
[LoadLoad]
[LoadStore]

The issue is that the load and store can still be reordered and hence it isn't sequential consistent; and this is mandatory for the Java Memory model. They only way to prevent this is with a [StoreLoad]. And the most logical place would be to add it to the write since normally reads are more frequent than writes.

And this can be accomplished by an MFENCE or a lock addl %(RSP),0

Answer to question 2:

The problem with a putLong is that not only the CPU can reorder instructions, also the compiler could change to code in such a way that it leads to instruction reordering.

Example: if you would be doing a putLong in a loop, the compiler could decide to pull the write out of the loop and the value will not become visible to other threads. If you want to have a low overhead single writer performance counter, you might want to have a look at putLongRelease/putLongOrdered(oldname). This will prevent the compiler from doing the above trick. And the release semantics on the X86 you get for free.

But it is very difficult to give a one fits all solution to your second question because it depends on what your goal is.

Upvotes: 2

Cost of volatile writes

Answers (1)

Related Questions