Reputation: 79
Let us consider the following one-reader/one-writer queue implemented with a linked list.
struct queue {
queue() {
tail = head = &reserved;
n = 0;
}
void push(item *it) {
tail->next = it;
tail = it;
n++;
}
item* pop() {
while (n == used);
++used;
head = head->next;
return head;
}
item reserved;
item *tail, *head;
int used = 0;
std::atomic <int> n;
}
Now I find that using volatile int n
could make my writer run faster, while I'm not sure whether it guarantees that head = head->next
can always read the correct value.
UPDATED: What if adding an atomic operation between tail->next
, n++
, i.e.,
void push(item *it) {
tail->next = it;
tail = it;
a.store(0, std::memory_order_release);
a.load(std::memory_order_acquire);
n++;
}
in which a
is never accessed by the reader? Will this guarantees the order of tail->next = it
and head = head->next
? (Still, it runs faster than using atomic n
)
Upvotes: 1
Views: 283
Reputation: 2949
As already pointed out in several other comments, volatile is unrelated to multithreading, so it should not be used here. However, the reason why volatile performs better then an atmoic simply is, because with a volatile ++n
translates to simple load, inc, store instructions, while with an atomic it translates to a more expensive lock xadd
(assuming you compile for x86).
But since this is only a single-reader single-writer queue, you don't need expensive read-modify-write operations:
struct queue {
queue() {
tail = head = &reserved;
n = 0;
}
void push(item *it) {
tail->next = it;
tail = it;
auto new_n = n.load(std::memory_order_relaxed) + 1;
n.store(new_n, std::memory_order_release);
}
item* pop() {
while (n.load(std::memory_order_acquire) == used);
++used;
head = head->next;
return head;
}
item reserved;
item *tail, *head;
int used = 0;
std::atomic <int> n;
}
This should perform equally good as a volatile version. If the acquire-load in pop
"sees" the value written by the store-release in push
, the two operations synchronize-with each other, thereby establishing the required happens-before relation.
Upvotes: 0
Reputation: 2193
The volatile
keyword in C++ is not a construct to give variable read/write a guarantee to be so ordered as it looks in the code, in multi-threaded environments. So, in your code with the atomic template wrapped counter being made bare with the volatile
keyword alone, increasing of the counter observed by a consumer thread does not guarantee that item::next
has also been updated.
To achieve maximum performance with the guarantee, I think at least you have to insert a write barrier between updating head->next
and the incremenet to the counter, e.g. by n.fetch_add(1, std::memory_order_release)
, and a read barrier just before fetching tail->next
, like n.load(std::memory_order_acquire)
. I don't know CPU-arch specific details, though.
Upvotes: 2