Reputation: 3
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>
using namespace std::chrono;
const int nthreads = 4;
const int64_t ndata = 1600000;
int total_sum = 0;
void compute_sum(int tid, std::vector<int>& d)
{
int st = (ndata / nthreads) * tid;
int en = (ndata / nthreads) * (tid + 1);
for (int i = st; i < en; i++) {
total_sum += d[i];
}
}
int main(int argc, char ** arg)
{
std::vector<std::thread> threads;
std::vector<int> data;
for (int i = 0; i < ndata; i++) {
data.push_back(i);
}
for (int i = 0; i < nthreads; i++) {
threads.push_back(std::thread(compute_sum, i, std::ref(data)));
}
for (auto &th : threads) {
th.join();
}
return 0;
}
Here total_sum
is shared between two threads, thus race condition occurs and this code is messed up (until atomics or locks are used).
But when hardware level coherency is implemented like MESI protocol, shouldn't the shared memory be handled properly by the hardware itself such that no race condition occurs?
Upvotes: 0
Views: 110
Reputation: 1592
The big gap between the c++ abstract machine and the physical hardware is that without atomics or memory fences, the abstract machine is allowed to assume that the variables aren't seen from anything except the current thread. That means that the compiler is allowed to generate code that reads total_sum
into a register, do all of the summing operations on that register, and then commit the value back to memory after the loop.
When that happens, all of the other writes that may have happened to the memory between the read and when it was written back are not seen/are over written.
Once you add atomics, the operations need to always operate on the memory and be globally visible as soon as they happen. If you add mutexes or other locking, then the threads must commit their results back to memory before releasing the lock and must read from memory after the lock is acquired, forcing a globally consistent sequence of operations.
Just because the hardware has instructions and protocols that handle things doesn't mean the compiler generates code that exercises those mechanisms unless it needs to based on the language features in use.
Upvotes: 1
Reputation: 62110
The MESI protocol (Wikipedia) is a mechanism which improves the performance of maintaining the coherence of a CPU Cache.
A CPU Cache (Wikipedia) is a hardware component which lies between the CPU and the main memory of a computer.
The coherence of the CPU cache is completely unrelated to the thread-safety of:
std::vector<>
that your code might use, orIn other words, one has absolutely nothing to do with the other.
Knowledge of the existence of the CPU cache and of its inner workings is only helpful to expert programmers as a means of writing code that will perform more optimally by taking into account peculiarities about the way the cache works. (For example, when iterating over all elements of a two-dimensional array declared as Array[Y][X]
, you will usually get better performance if your outer loop iterates along Y, and your inner loop iterates along X.)
Upvotes: 2