Reputation: 45444
The OpenMP standard only considers C++ 98 (ISO/IEC 14882:1998). This means that there is no standard supporting usage of OpenMP under C++03 or even C++11. Thus, any program that uses C++ >98 and OpenMP operates outside of standards, implying that even if it works under certain conditions, it's unlikely to be portable but definitely never guaranteed.
The situation is even worse with C++11 with its own multi-threading support, which very likely will clash with OpenMP for certain implementations.
So, how safe is it to use OpenMP with C++03 and C++11?
Can one safely use C++11 multi-threading as well as OpenMP in one and the same program but without interleaving them (i.e. no OpenMP statement in any code passed to C++11 concurrent features and no C++11 concurrency in threads spawned by OpenMP)?
I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.
Upvotes: 45
Views: 11855
Reputation: 22670
OpenMP 5.0 now defines the interaction towards C++11. But generally using anything from C++11 and further "may result in unspecified behavior".
This OpenMP API specification refers to ISO/IEC 14882:2011 as C++11. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
Upvotes: 1
Reputation: 5652
OpenMP is often (I am aware of no exceptions) implemented on top of Pthreads, so you can reason about some of the interoperability questions by thinking about how C++11 concurrency interoperates with Pthread code.
I don't know if oversubscription due to the use of multiple threading models is an issue for you, but this is definitely an issue for OpenMP. There is a proposal to address this in OpenMP 5. Until then, how you solve this is implementation defined. They are heavy hammers, but you can use OMP_WAIT_POLICY
(OpenMP 4.5+), KMP_BLOCKTIME
(Intel and LLVM), and GOMP_SPINCOUNT
(GCC) to address this. I'm sure other implementations have something similar.
One issue where interoperability is a real concern is w.r.t. the memory model, i.e. how atomic operations behave. This is currently undefined, but you can still reason about it. For example, if you use C++11 atomics with OpenMP parallelism, you should be fine, but you are responsible for using C++11 atomics correctly from OpenMP threads.
Mixing OpenMP atomics and C++11 atomics is a bad idea. We (the OpenMP language committee working group charged with looking at OpenMP 5 base language support) are currently trying to sort this out. Personally, I think C++11 atomics are better than OpenMP atomics in every way, so my recommendation is that you use C++11 (or C11, or __atomic
) for your atomics and leave #pragma omp atomic
for the Fortran programmers.
Below is an example code that uses C++11 atomics with OpenMP threads. It works as designed everywhere I have tested it.
Full disclosure: Like Jim and Mike, I work for Intel :-)
#if defined(__cplusplus) && (__cplusplus >= 201103L)
#include <iostream>
#include <iomanip>
#include <atomic>
#include <chrono>
#ifdef _OPENMP
# include <omp.h>
#else
# error No OpenMP support!
#endif
#ifdef SEQUENTIAL_CONSISTENCY
auto load_model = std::memory_order_seq_cst;
auto store_model = std::memory_order_seq_cst;
#else
auto load_model = std::memory_order_acquire;
auto store_model = std::memory_order_release;
#endif
int main(int argc, char * argv[])
{
int nt = omp_get_max_threads();
#if 1
if (nt != 2) omp_set_num_threads(2);
#else
if (nt < 2) omp_set_num_threads(2);
if (nt % 2 != 0) omp_set_num_threads(nt-1);
#endif
int iterations = (argc>1) ? atoi(argv[1]) : 1000000;
std::cout << "thread ping-pong benchmark\n";
std::cout << "num threads = " << omp_get_max_threads() << "\n";
std::cout << "iterations = " << iterations << "\n";
#ifdef SEQUENTIAL_CONSISTENCY
std::cout << "memory model = " << "seq_cst";
#else
std::cout << "memory model = " << "acq-rel";
#endif
std::cout << std::endl;
std::atomic<int> left_ready = {-1};
std::atomic<int> right_ready = {-1};
int left_payload = 0;
int right_payload = 0;
#pragma omp parallel
{
int me = omp_get_thread_num();
/// 0=left 1=right
bool parity = (me % 2 == 0);
int junk = 0;
/// START TIME
#pragma omp barrier
std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();
for (int i=0; i<iterations; ++i) {
if (parity) {
/// send to left
left_payload = i;
left_ready.store(i, store_model);
/// recv from right
while (i != right_ready.load(load_model));
//std::cout << i << ": left received " << right_payload << std::endl;
junk += right_payload;
} else {
/// recv from left
while (i != left_ready.load(load_model));
//std::cout << i << ": right received " << left_payload << std::endl;
junk += left_payload;
///send to right
right_payload = i;
right_ready.store(i, store_model);
}
}
/// STOP TIME
#pragma omp barrier
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
/// PRINT TIME
std::chrono::duration<double> dt = std::chrono::duration_cast<std::chrono::duration<double>>(t1-t0);
#pragma omp critical
{
std::cout << "total time elapsed = " << dt.count() << "\n";
std::cout << "time per iteration = " << dt.count()/iterations << "\n";
std::cout << junk << std::endl;
}
}
return 0;
}
#else // C++11
#error You need C++11 for this test!
#endif // C++11
Upvotes: 4
Reputation: 221
Like Jim Cownie, I’m also an Intel employee. I agree with him that Intel Threading Building Blocks (Intel TBB) might be a good option since it has loop-level parallelism like OpenMP but also other parallel algorithms, concurrent containers, and lower-level features too. And TBB tries to keep up with the current C++ standard.
And to clarify for Walter, Intel TBB includes a parallel_reduce algorithm as well as high-level support for atomics and mutexes.
You can find the Intel® Threading Building Block’s User Guide at http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm The User Guide gives an overview of the features in the library.
Upvotes: 5
Reputation: 2869
I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based)
Maybe you are really looking for TBB? That provides support for loop and task based parallelism, as well as a variety of parallel data structures, in standard C++, and is both portable and open-source.
(Full disclaimer: I work for Intel who are heavily involved with TBB, though I don't actually work on TBB but on OpenMP :-); I am certainly not speaking for Intel!).
Upvotes: 8
Reputation: 74455
Walter, I believe I not only told you the current state of things in that other discussion, but also provided you with information directly from the source (i.e. from my colleague who is part of the OpenMP language committee).
OpenMP was designed as a lightweight data-parallel addition to FORTRAN and C, later extended to C++ idioms (e.g. parallel loops over random-access iterators) and to task parallelism with the introduction of explicit tasks. It is meant to be as portable across as many platforms as possible and to provide essentially the same functionality in all three languages. Its execution model is quite simple - a single-threaded application forks teams of threads in parallel regions, runs some computational tasks inside and then joins the teams back into serial execution. Each thread from a parallel team can later fork its own team if nested parallelism is enabled.
Since the main usage of OpenMP is in High Performance Computing (after all, its directive and execution model was borrowed from High Performance Fortran), the main goal of any OpenMP implementation is efficiency and not interoperability with other threading paradigms. On some platforms efficient implementation could only be achieved if the OpenMP run-time is the only one in control of the process threads. Also there are certain aspects of OpenMP that might not play well with other threading constructs, for example the limit on the number of threads set by OMP_THREAD_LIMIT
when forking two or more concurrent parallel regions.
Since the OpenMP standard itself does not strictly forbid using other threading paradigms, but neither standardises the interoperability with such, supporting such functionality is up to the implementers. This means that some implementations might provide safe concurrent execution of top-level OpenMP regions, some might not. The x86 implementers pledge to supporting it, may be because most of them are also proponents of other execution models (e.g. Intel with Cilk and TBB, GCC with C++11, etc.) and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).
OpenMP 4.0 is also not going further than ISO/IEC 14882:1998 for the C++ features it employs (the SC12 draft is here). The standard now includes things like portable thread affinity - this definitely does not play well with other threading paradigms, which might provide their own binding mechanisms that clash with those of OpenMP. Once again, the OpenMP language is targeted at HPC (data and task parallel scientific and engineering applications). The C++11 constructs are targeted at general purpose computing applications. If you want fancy C++11 concurrent stuff, then use C++11 only, or if you really need to mix it with OpenMP, then stick to the C++98 subset of language features if you want to stay portable.
I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.
There are no obvious reasons for what you want to not be possible, but it is up to your OpenMP compiler and run-time. There are free and commercial libraries that use OpenMP for parallel execution (for example MKL), but there are always warnings (although sometimes hidden deeply in their user manuals) of possible incompatibility with multithreaded code that give information on what and when is possible. As always, this is outside of the scope of the OpenMP standard and hence YMMV.
Upvotes: 27