Reputation: 125
I want to implement the following function that marks some elements of array by 1.
void mark(std::vector<signed char>& marker)
{
#pragma omp parallel for schedule(dynamic, M)
for (int i = 0; i < marker.size; i++)
marker[i] = 0;
#pragma omp parallel for schedule(dynamic, M)
for (int i = 0; i < marker.size; i++)
marker[getIndex(i)] = 1; // is it ok ?
}
What will happen if we try to set value of the same element to 1 in different threads at the same time? Will it be normally set to 1 or this loop may lead to unexpected behavior?
Upvotes: 5
Views: 1665
Reputation: 8032
This answer is wrong in one fundamental part (emphasis mine):
If you write with different threads to the very same location, you get a race condition. This is not necessarily undefined behaviour, but nevertheless it need to be avoided.
Having a look at the OpenMP standard, section 1.4.1 says (also emphasis mine):
If multiple threads write without synchronization to the same memory unit, including cases due to atomicity considerations as described above, then a data race occurs. Similarly, if at least one thread reads from a memory unit and at least one thread writes without synchronization to that same memory unit, including cases due to atomicity considerations as described above, then a data race occurs. If a data race occurs then the result of the program is unspecified.
Technically the OP snippet is in the undefined behavior realm. This implies that there is no guarantee on the behavior of the program until the UB is removed from it.
The simplest way to do it is to protect memory access with an atomic operation:
#pragma omp parallel for schedule(dynamic, M)
for (int i = 0; i < marker.size; i++)
#pragma omp atomic write seq_cst
marker[getIndex(i)] = 1;
but that will probably hinder performance in a sensible way (as was correctly noted by @schorsch312).
Upvotes: 4
Reputation: 5714
If you write with different threads to the very same location, you get a race condition. This is not necessarily undefined behaviour, but nevertheless it need to be avoided.
Since, you write a "1" with all threads it might be ok, but if you write real data it is probably not.
Side note: In order to have a good numerical performance, you need to work on memory which is not too close to each other. If two threads are writing to two different elements in one cache line, this chunk of memory will be invalidated for all other threads. This will lead to a cache miss and will spoil your performance gain (parallel execution might be even slower than single threaded execution).
Upvotes: 3