Reputation: 141
I'd like to write a C++ lock-free object where there are many logger threads logging to a large global (non-atomic) ring buffer, with an occasional reader thread which wants to read as much data in the buffer as possible. I ended up having a global atomic counter where loggers get locations to write to, and each logger increments the counter atomically before writing. The reader tries to read the buffer and per-logger local (atomic) variable to know whether particular buffer entries are busy being written by some logger, so as to avoid using them.
So I have to do synchronization between a pure reader thread and many writer threads. I sense that the problem can be solved without using locks, and I can rely on "happens after" relation to determine whether my program is correct.
I've tried relaxed atomic operation, but it won't work: atomic variable stores are releases and loads are acquires, and the guarantee is that some acquire (and its subsequent work) always "happen after" some release (and its preceding work). That means there is no way for the reader thread (doing no store at all) to guarantee that something "happens after" the time it reads the buffer, which means I don't know whether some logger has overwritten part of the buffer when the thread is reading it.
So I turned to sequential consistency. For me, "atomic" means Boost.Atomic, which notion of sequential consistency has a "pattern" documented:
The third pattern for coordinating threads via Boost.Atomic uses seq_cst for coordination: If ...
- thread1 performs an operation A,
- thread1 subsequently performs any operation with seq_cst,
- thread1 subsequently performs an operation B,
- thread2 performs an operation C,
- thread2 subsequently performs any operation with seq_cst,
- thread2 subsequently performs an operation D,
then either "A happens-before D" or "C happens-before B" holds.
Note that the second and fifth lines say "any operation", without saying whether it modify anything, or what it operates on. This provides the guarantee that I wanted.
All is happy until I watch the talk of Herb Sutter titled "atomic<> Weapnos". What he implies is that seq_cst is just a acq_rel, with the additional guarantee of consistent atomic stores ordering. I turned to the cppreference.com, which have similar description.
So my questions:
Upvotes: 3
Views: 364
Reputation: 141
I saw no answer here, so I asked again in the Boost user mailing list. I saw no answer there either (apart from a suggestion to look into Boost lockfree), so I planed to ask Herb Sutter (expecting no answer anyway). But before doing that, I Googled "C++ memory model" a little more deeply. After reading a page of Hans Boehm (http://www.hboehm.info/c++mm/), I could answer most of my own question. I Googled a bit more, this time for "C++ Data Race", and landed at a page by Bartosz Milewski (http://bartoszmilewski.com/2014/10/25/dealing-with-benign-data-races-the-c-way/). Then I can answer even more of my own question. Unluckily, I still don't know how to do what I want to do given that knowledge. Perhaps what I want to do is actually unachieveable in standard C++.
My first part of the question: "Does C++11 and Boost.Atomic implement the same memory model?" The answer is, mostly, "yes". My second part of the question: "If (1) is 'yes', does it mean the "pattern" described by Boost is somehow implied by the C++11 memory model?" The answer is again, yes. "How?" is answered by a proof found here (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2392.html). Essentially, for data race free programs, the little bit added to acq_rel is sufficient to guarantee the behavior required by seq_cst. So both documentation, although perhaps confusing, are correct.
Now the real problem: although both (1) and (2) get "yes" answers, my original program is wrong! I neglected (actually, I'm unaware of) an important rule of C++: a program with data race has undefined behavior (rather than an "unspecified" or "implementation defined" one). That is, the compiler guarantees behavior of my program only if my program has absolutely no data race. Without a lock, my program contains a data race: the pure reader thread can read any time, even at a time when the logger thread is busy writing. This is "undefined behavior", and the rule says that the computer can do anything (the "catch fire" rule). To fix it, one has to use ideas found in the page of Bartosz Milewski I mentioned earlier, i.e., change the ring buffer to contain only atomic content, so that the compiler knows that its ordering is important and must not be reordered with the operations marked to require sequential consistency. If overhead minimization is desired, one can write to it using relaxed atomic operations.
Unluckily, this applies to the reader thread too. I can no longer just "memcpy" the whole memory buffer. Instead I must also use relaxed atomic operations to read the buffer, one word after another. This kills performance, but I have no choice actually. Luckily for me, the dumper's performance is not important to me at all: it rarely gets run anyway. But if I do want the performance of "memcpy", I would get an answer of "no solution": C++ provides no semantics of "I know there is data race, you can return anything to me here but don't screw up my program". Either you ensure that there is no data race and pay the cost to get everything well defined, or you have a data race and the compiler is allowed to put you to jail.
Upvotes: 1