TSK
TSK

Reputation: 751

Why release sequence can only contain read-modify-write but not pure write

After a release operation A is performed on an atomic object M, the longest continuous subsequence of the modification order of M that consists of:

  1. Writes performed by the same thread that performed A. (until C++20)
  2. Atomic read-modify-write operations made to M by any thread. Is known as release sequence headed by A.

Upvotes: 6

Views: 248

Answers (1)

Dhwani Katagade
Dhwani Katagade

Reputation: 1290

This is not a complete answer and is based on my notes of my current newbie understanding of the matter. I have picked the bits from some of the comments on this question and also from other sources. I am posting them here in case it helps someone and also myself in clarifying my understanding further.

Q1: Why do we need the concept of release sequence?

This is already answered excellently here What does "release sequence" mean?. The following are points capturing my current understanding.

  • Release sequence is relevant in the case of release/acquire synchronization on an atomic variable
  • There are scenarios where an older store release would need to synchronize with a much later load acquire
  • Release sequence allows for intervening modification operations on the atomic to not break this synchronization
  • Without the support of release sequences such cases would have to use the more costly sequential consistency model
  • The facility provided by release sequences is relevant for example in the implementation of reference counting
    • The resource creating thread will create the resource and then store release the reference counter from 0 to 1
    • Multiple intervening threads may read-modify-write increment or decrement the reference counter
    • The final thread read-modify-write acquire decrements the counter to 0
    • This thread needs to clean up the created resource and release sequence synchronization ensures that it is visible

Q2: Is the first item removed in C++20?

Yes. As explained in the paper P0982R1: Weaken Release Sequences, following were the reasons behind the removal.

  • Inclusion of "atomic stores by the same thread that did the initial store release" in the release sequence was initially considered an add-on without any extra cost considering the majority of then prevalent architectures
    • Discussions over time revealed that there weren't any compelling valid use cases for this inclusion
    • Also it was found to impose unwanted constraints for newer weaker architectures that were building conformance for the C++ standard
    • "Atomic stores by same thread" was also considered to be a brittle requirement
      • Subsequent unrelated modifications to code could inadvertently break this requirement
      • A code change down the call tree that now uses async/task could run on a separate thread
  • Finally it was considered not to have any compelling use cases for the complications it brought into the picture for newer architectures
  • It was agreed to remove this inclusion even though it was technically a breaking change from previous behaviour

That said, this still doesn't fix all outstanding problems with release sequences as highlighted in the paper P0668R2: Revising the C++ memory model.

  • The synchronization achieved through C++20 release sequences can still get broken by an unrelated thread doing a non-RMW write to the atomic variable
    • All related threads may use read-modify-write correctly till the final thread does an acquire and the expected synchronization is complete and value semantics are maintained
    • But if some unrelated thread performs a non-RMW store on the atomic somewhere in between that chain then the chain can get broken
  • This is considered bad because it makes C++ code hard to reason about
    • Existing code is correct and also works correctly
    • Someone adds new code that adds a new unrelated thread that does a non-RMW write to the same variable
    • This is an extension (non modification of existing code) to the code
    • General software engineering principles recommend favouring extension over modification
    • If an extension breaks existing behaviour then that's not good

There is good reason why Herb Sutter says "Non default atomics.. don't go there.". Knowing all these nuances and managing them is expecting just too much from developers. But even if one ventures into the "with caution" territory, there are recommended ways to avoid such surprises. Its best to encapsulate away all non default atomics stuff and manage any change with extra review. This is also anecdotally mentioned in the part 2 of Herb Sutter's talk.

Q3: Why read-modify-write operations qualify in a release sequence but pure write operations don't? What's special about relaxed RMWs that lets them form a chain without being an acquire load and release store?

I don't have much understanding of individual architecture level details of hardware behaviours. All the same following are my conceptual intuition based notes if it helps some one. In order to understand how read-modify-write operations are different from normal stores for being included in a release sequence, I use the following line of thought.

  • The way I intuitively see it the requirement of release sequences is to
    • Maintain the semantic chaining of values with intervening accumulative modifications
      • Maintain the meaning of the first store release when it is followed by non overwriting stores
      • For example maintain a counter with increments and decrements without losing count
      • For example logical OR multiple flags in a status word as flags are set one by one
    • And maintain the side effect visibility of the release heading the chain, down the chain
  • Read-modify-write operations that read a variable and then modify it based on the read value capture this semantic chaining
  • If all modifications to an atomic after the first release store are done only by read-modify-write operations then
    • All intervening read-modify-write operations form a semantic value chain that maps to the variable's total modification order
    • A read-modify-write operation by any thread is required to read the value just before this operation's write in the variable's modification order. This means the read-modify-write always overwrites the value that it read and modified, which conceptually maintains the semantic value chain (See footnote 1)
    • A read-modify-write operation is also atomic making it uninterruptible and indivisible between the read and write
    • The limited and stricter modification required for semantic value chaining is well represented by read-modify-write operations
  • Use of read-modify-write operations for preserving semantic chaining of values is not fool proof though
    • If any non-RMW store takes place between any of the read-modify-write operations then the chain is broken
    • But with carefully done code it is still possible to maintain semantic chaining of values by using only RMW operations
    • Runtime non-determinism will not break the value chaining semantics (See footnote 1)
  • Instead if we try to replace the RMW operations with a load acquire and store release pair on the same thread
    • On first look it might appear that they seem to mimic the value chaining semantics
    • But the read and write are not pair wise atomic and can get interleaved with read and write from other read write pairs
    • Essentially release/acquire synchronization is a run time feature not controlled by code time determinism
    • Even allowing intervening modifications via non-RMW stores would cause an overwrite and break the chain
  • Allowing intervening modifications via read-modify-write operations works well due to the following properties
    • Pairing of a read and write naturally maintains the semantic chaining of values
    • No other write can happen between the read and write which would otherwise break the semantic chaining of values
  • As defined by the C++20 release sequence
    • Relaxed read-modify-write operations will continue the release sequence, even while they don't form a synchronization point
    • Acquire read-modify-write operations will similarly continue the release sequence, but also form a synchronization point

Footnote 1: As pointed out by Peter Cordes it's common to misread the standard text regarding the value that a RMW operation will read. It might seem intuitive that RMW will read the "last written" value from the atomic, but the catch here is that, which was the last written value for this RMW operation, is evident only after this operation is complete. Here complete means in a way that it has found its place in the variable's modification order and all threads have a consensus about it. Which exact order of values the variable's modification order will take is a result of run time non-determinism, and something that we can't in general make assumptions about, even if we have second hand evidence of a certain value being visible. All the same in the use case under discussion, for example in case of keeping a counter, it does not matter where in the modification order a certain increment or decrement landed. What matters is that they keep the count correct. The RMWs should keep the read, modify, write, read, modify, write in lock step and not end up with read, modify, read, modify, write, write which would mess up the count. This much is ensured by the standard text as long as we don't break the semantic value chain by allowing an intervening pure store.

Footnote 2: I have been reading around trying to get my head around the puzzling world of memory ordering. From my current level of understanding I feel the standard tries to formalize some order and visibility promises to the developers because the developers need them for genuine reasons to support important use cases. The promise might be natural for one architecture to provide, depending on the design goals it prioritizes and the mechanisms it uses, while it might not be straightforward for some other architecture, either available today or a future one. Knowing the details of how most architectures fulfill the promise is a good curiosity question, but it in itself should not be used as justification for why the promise is provided by the standard in the first place. What minimal promises are formalized by the standard should be driven by what is the genuine minimal ask from the developer community, not what is already easily available on some architecture. From that perspective I feel how some architecture achieves a certain promise does not have a direct bearing on which promises are granted by the standard. In this attempt of an answer I have tried to intuitively explore the semantic 'oughtas' and 'shouldas' of why a certain promise is supported by the standard the way it is.

Upvotes: 0

Related Questions