Reputation: 58052
This came up while thinking about Thread sanitizer warnings after using parallel std::for_each.
Algorithms like std::for_each
with parallel execution policies can execute code in worker threads created by the implementation. Do these threads synchronize with the call and return of for_each
by the calling thread, or something to that effect? Common sense seems to suggest that they should, but I can't find a guarantee in the C++20 standard.
Consider the following simple example (try on godbolt):
#include <algorithm>
#include <execution>
#include <iostream>
void increment(int &a) {
a++;
}
int main(void) {
constexpr size_t n = 1000;
static int arr[n];
arr[0] = 3;
std::for_each(std::execution::par, arr, arr+n, increment);
std::cout << arr[0] << std::endl;
return 0;
}
This is intended to always output 4
.
The implementation may call increment(arr[0])
in another thread, which does arr[0]++
. Does the store arr[0] = 3
in the main thread happen before arr[0]++
in the sense of intro.races p10? Likewise, does arr[0]++
happen before the load of arr[0]
in std::cout << arr[0]
? I would naively expect that they should, but I can't see any way to prove it. algorithms.parallel doesn't seem to contain anything about synchronization with surrounding code.
If not, then the example contains data races and its behavior is undefined. This would make it rather difficult to use std::execution::par
correctly, and I would wonder if it is a defect.
Without such a guarantee, the implementation could conceivably do something like the following:
std::atomic<int *> work = nullptr;
void do_work() {
int *p;
while (!(p = work.load(std::memory_order_relaxed)))
std::this_thread::yield();
(*p)++;
}
// started at program startup
std::thread worker_thread(do_work);
int main() {
// ...
arr[0] = 3;
// for_each does the following:
work.store(&arr[0], std::memory_order_relaxed);
worker_thread.join();
// ...
}
If it did then we really would have a data race.
Upvotes: 4
Views: 835
Reputation: 275385
Using cppreference:
The execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized. The invocations of element access functions in parallel algorithms invoked with this policy (usually specified as std::execution::par) are permitted to execute in either the invoking thread or in a thread implicitly created by the library to support parallel algorithm execution. Any such invocations executing in the same thread are indeterminately sequenced with respect to each other.
The operations done in threads (logically) created in std::for_each
are sequenced-after the thread is created.
From the draft:
The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution::parallel_policy are permitted to execute either in the invoking thread of execution or in a thread of execution implicitly created by the library to support parallel algorithm execution. If the threads of execution created by thread ([thread.thread.class]) or jthread ([thread.jthread.class]) provide concurrent forward progress guarantees ([intro.progress]), then a thread of execution implicitly created by the library will provide parallel forward progress guarantees; otherwise, the provided forward progress guarantee is implementation-defined. Any such invocations executing in the same thread of execution are indeterminately sequenced with respect to each other.
the wording is slightly different but similar.
I suppose you could weasel around it; there is no explicit guarantee that the thread implicitly created by the library to support parallel algorithm execution need be created (or joined) in the foreach
method.
But the postconditions of the various algorithms need to be met, which should deal with the "after" problem; how it is guaranteed that the postconditions have occured before the std::for_each
returns isn't specified, but it is guaranteed that the postcondition has occurred. Which to me reads as if the application happens before the std::for_each
returns.
For startup sequencing, the best I can do is to read the standard as implying it must behave as-if threads are created for this purpose inside std::for_each
, so there is a sequencing guarantee. But I admit this wording is a bit vague, "created by the library" is pretty passive voice.
Upvotes: 3