Reputation: 698
What is the most correct way to relax the synchronization of the variables valA and valB on the loading in ThreadMethodOne (Assuming there is no false cacheline sharing of valA and valB)? It would seem that I should not change ThreadMethodOne to use memory_order_relaxed for the loading of valA since the compiler could move the valA.load after valB.load since the memory_order_acquire on valB.load doesn’t protect valA from moving after the valB.load once that change is made. It also seems that I can’t use memory_order_relaxed on the valB.load since it would no longer synchronize with the fetch_add in ThreadMethodTwo. Would it be better to swap the items and relax the load of valA?
Is this the correct change?
nTotal += valB.load(std::memory_order_acquire);
nTotal += valA.load(std::memory_order_relaxed);
Looking at the results on Compiler Explorer seems to show the same code generation for ThreadMethodOne when using memory_order_relaxed for either valA or valB even when I don't swap the order of the instructions. I also see that the memory_order_relaxed in the ThreadMethodTwo still compiles to be the same as memory_order_release. Changing the memory_order_relaxed to the following line seems to make it a non-lock add 'valA.store(valA.load(std::memory_order_relaxed) + 1, std::memory_order_relaxed);' But I don't know if this is better.
Full program:
#include <stdio.h>
#include <stdlib.h>
#include <thread>
#include <atomic>
#include <unistd.h>
bool bDone { false };
std::atomic_int valA {0};
std::atomic_int valB {0};
void ThreadMethodOne()
{
while (!bDone)
{
int nTotal {0};
nTotal += valA.load(std::memory_order_acquire);
nTotal += valB.load(std::memory_order_acquire);
printf("Thread total %d\n", nTotal);
}
}
void ThreadMethodTwo()
{
while (!bDone)
{
valA.fetch_add(1, std::memory_order_relaxed);
valB.fetch_add(1, std::memory_order_release);
}
}
int main()
{
std::thread tOne(ThreadMethodOne);
std::thread tTwo(ThreadMethodTwo);
usleep(100000);
bDone = true;
tOne.join();
tTwo.join();
int nTotal = valA.load(std::memory_order_acquire);
nTotal += valB.load(std::memory_order_acquire);
printf("Completed total %d\n", nTotal);
}
A better sample leaving the original one since it was the one written about in the comments
#include <stdio.h>
#include <stdlib.h>
#include <thread>
#include <atomic>
#include <unistd.h>
std::atomic_bool bDone { false };
std::atomic_int valA {0};
std::atomic_int valB {0};
void ThreadMethodOne()
{
while (!bDone)
{
int nTotalA = valA.load(std::memory_order_acquire);
int nTotalB = valB.load(std::memory_order_relaxed);
printf("Thread total A: %d B: %d\n", nTotalA, nTotalB);
}
}
void ThreadMethodTwo()
{
while (!bDone)
{
valB.fetch_add(1, std::memory_order_relaxed);
valA.fetch_add(1, std::memory_order_release);
}
}
int main()
{
std::thread tOne(ThreadMethodOne);
std::thread tTwo(ThreadMethodTwo);
usleep(100000);
bDone = true;
tOne.join();
tTwo.join();
int nTotalA = valA.load(std::memory_order_acquire);
int nTotalB = valB.load(std::memory_order_relaxed);
printf("Completed total A: %d B: %d\n", nTotalA, nTotalB);
}
Upvotes: 0
Views: 199
Reputation: 6791
After cleaning up your code, see my comment, we get something like,
#include <atomic>
#include <iostream>
std::atomic_int valA {0};
std::atomic_int valB {0};
void ThreadMethodOne()
{
int nTotalA = valA.load(std::memory_order_acquire);
int nTotalB = valB.load(std::memory_order_relaxed);
std::cout << "Thread total A: " << nTotalA << " B: " << nTotalB << '\n';
}
void ThreadMethodTwo()
{
valB.fetch_add(1, std::memory_order_relaxed);
valA.fetch_add(1, std::memory_order_release);
}
int main()
{
std::thread tOne(ThreadMethodOne);
std::thread tTwo(ThreadMethodTwo);
tOne.join();
tTwo.join();
int nTotalA = valA.load(std::memory_order_acquire);
int nTotalB = valB.load(std::memory_order_relaxed);
std::cout << "Completed total A: " << nTotalA << " B: " << nTotalB << '\n';
}
The possible outcomes of this program are:
Thread total A: 0 B: 0
Completed total A: 1 B: 1
or
Thread total A: 0 B: 1
Completed total A: 1 B: 1
or
Thread total A: 1 B: 1
Completed total A: 1 B: 1
The reason that it always prints Completed total A: 1 B: 1
is that thread 2 was joined and thus finished, which added 1 to each variable, and the loads in thread 1 have no influence on that.
If thread 1 runs and completes in its entirety before thread 2 then it will obviously print 0 0, while if thread 2 runs and completes in its entirety before thread 1 then thread 1 will print 1 1. Note how doing a memory_order_acquire load in thread 1 doesn't enforce anything. It can easily read the initial value of 0.
If the threads run more or less at the same time then the outcome of 0 1 is also quite trivial: thread 1 might execute its first line, then thread 2 executes both of its lines and finally thread 1 reads the value written by thread 2 to valB (it doesn't have to because it is relaxed, but int that case we just get the 0 0 output; at the very least it is possible however that it will read 1, if we wait long enough).
So, the only question of interest is: why don't we see an output of 1 0?
The reason is that if thread 1 reads a value 1 for valA then that has to be the value written by thread 2. Here the write whose value is read is a write release, while the read itself is a read acquire. This causes a synchronization to happen, causing every side effect of thread 2 that happened before the write release to be visible to every memory access in thread 1 after the read release. In other words, if we read valA==1 then the subsequent read of valB (relaxed or not) will see the write to valB of thread 2 and thus always see a 1 and never a 0.
Unfortunately I cannot say more about this because your question is very unclear: I don't know what you expected the outcome to be, or want to be; so I can say nothing about memory requirements for that to happen.
Upvotes: 1