Reputation: 45444

Can I safely use OpenMP with C++11?

The OpenMP standard only considers C++ 98 (ISO/IEC 14882:1998). This means that there is no standard supporting usage of OpenMP under C++03 or even C++11. Thus, any program that uses C++ >98 and OpenMP operates outside of standards, implying that even if it works under certain conditions, it's unlikely to be portable but definitely never guaranteed.

The situation is even worse with C++11 with its own multi-threading support, which very likely will clash with OpenMP for certain implementations.

So, how safe is it to use OpenMP with C++03 and C++11?

Can one safely use C++11 multi-threading as well as OpenMP in one and the same program but without interleaving them (i.e. no OpenMP statement in any code passed to C++11 concurrent features and no C++11 concurrency in threads spawned by OpenMP)?

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

Upvotes: 45

Answers (5)

Zulan

Reputation: 22670

OpenMP 5.0 now defines the interaction towards C++11. But generally using anything from C++11 and further "may result in unspecified behavior".

This OpenMP API specification refers to ISO/IEC 14882:2011 as C++11. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.

Alignment support
Standard layout types
Allowing move constructs to throw
Defining move special member functions
Concurrency
Data-dependency ordering: atomics and memory model
Additions to the standard library
Thread-local storage
Dynamic initialization and destruction with concurrency
C++11 library

Upvotes: 1

Jeff Hammond

Reputation: 5652

OpenMP is often (I am aware of no exceptions) implemented on top of Pthreads, so you can reason about some of the interoperability questions by thinking about how C++11 concurrency interoperates with Pthread code.

I don't know if oversubscription due to the use of multiple threading models is an issue for you, but this is definitely an issue for OpenMP. There is a proposal to address this in OpenMP 5. Until then, how you solve this is implementation defined. They are heavy hammers, but you can use OMP_WAIT_POLICY (OpenMP 4.5+), KMP_BLOCKTIME (Intel and LLVM), and GOMP_SPINCOUNT (GCC) to address this. I'm sure other implementations have something similar.

One issue where interoperability is a real concern is w.r.t. the memory model, i.e. how atomic operations behave. This is currently undefined, but you can still reason about it. For example, if you use C++11 atomics with OpenMP parallelism, you should be fine, but you are responsible for using C++11 atomics correctly from OpenMP threads.

Mixing OpenMP atomics and C++11 atomics is a bad idea. We (the OpenMP language committee working group charged with looking at OpenMP 5 base language support) are currently trying to sort this out. Personally, I think C++11 atomics are better than OpenMP atomics in every way, so my recommendation is that you use C++11 (or C11, or __atomic) for your atomics and leave #pragma omp atomic for the Fortran programmers.

Below is an example code that uses C++11 atomics with OpenMP threads. It works as designed everywhere I have tested it.

Full disclosure: Like Jim and Mike, I work for Intel :-)

#if defined(__cplusplus) && (__cplusplus >= 201103L)

#include <iostream>
#include <iomanip>

#include <atomic>

#include <chrono>

#ifdef _OPENMP
# include <omp.h>
#else
# error No OpenMP support!
#endif

#ifdef SEQUENTIAL_CONSISTENCY
auto load_model  = std::memory_order_seq_cst;
auto store_model = std::memory_order_seq_cst;
#else
auto load_model  = std::memory_order_acquire;
auto store_model = std::memory_order_release;
#endif

int main(int argc, char * argv[])
{
    int nt = omp_get_max_threads();
#if 1
    if (nt != 2) omp_set_num_threads(2);
#else
    if (nt < 2)      omp_set_num_threads(2);
    if (nt % 2 != 0) omp_set_num_threads(nt-1);
#endif

    int iterations = (argc>1) ? atoi(argv[1]) : 1000000;

    std::cout << "thread ping-pong benchmark\n";
    std::cout << "num threads  = " << omp_get_max_threads() << "\n";
    std::cout << "iterations   = " << iterations << "\n";
#ifdef SEQUENTIAL_CONSISTENCY
    std::cout << "memory model = " << "seq_cst";
#else
    std::cout << "memory model = " << "acq-rel";
#endif
    std::cout << std::endl;

    std::atomic<int> left_ready  = {-1};
    std::atomic<int> right_ready = {-1};

    int left_payload  = 0;
    int right_payload = 0;

    #pragma omp parallel
    {
        int me      = omp_get_thread_num();
        /// 0=left 1=right
        bool parity = (me % 2 == 0);

        int junk = 0;

        /// START TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();

        for (int i=0; i<iterations; ++i) {

            if (parity) {

                /// send to left
                left_payload = i;
                left_ready.store(i, store_model);

                /// recv from right
                while (i != right_ready.load(load_model));
                //std::cout << i << ": left received " << right_payload << std::endl;
                junk += right_payload;

            } else {

                /// recv from left
                while (i != left_ready.load(load_model));
                //std::cout << i << ": right received " << left_payload << std::endl;
                junk += left_payload;

                ///send to right
                right_payload = i;
                right_ready.store(i, store_model);

            }

        }

        /// STOP TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

        /// PRINT TIME
        std::chrono::duration<double> dt = std::chrono::duration_cast<std::chrono::duration<double>>(t1-t0);
        #pragma omp critical
        {
            std::cout << "total time elapsed = " << dt.count() << "\n";
            std::cout << "time per iteration = " << dt.count()/iterations  << "\n";
            std::cout << junk << std::endl;
        }
    }

    return 0;
}

#else  // C++11
#error You need C++11 for this test!
#endif // C++11

Upvotes: 4

Mike Voss

Reputation: 221

Like Jim Cownie, I’m also an Intel employee. I agree with him that Intel Threading Building Blocks (Intel TBB) might be a good option since it has loop-level parallelism like OpenMP but also other parallel algorithms, concurrent containers, and lower-level features too. And TBB tries to keep up with the current C++ standard.

And to clarify for Walter, Intel TBB includes a parallel_reduce algorithm as well as high-level support for atomics and mutexes.

You can find the Intel® Threading Building Block’s User Guide at http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm The User Guide gives an overview of the features in the library.

Upvotes: 5

Jim Cownie

Reputation: 2869

I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based)

Maybe you are really looking for TBB? That provides support for loop and task based parallelism, as well as a variety of parallel data structures, in standard C++, and is both portable and open-source.

(Full disclaimer: I work for Intel who are heavily involved with TBB, though I don't actually work on TBB but on OpenMP :-); I am certainly not speaking for Intel!).

Upvotes: 8

Hristo Iliev

Reputation: 74455

Walter, I believe I not only told you the current state of things in that other discussion, but also provided you with information directly from the source (i.e. from my colleague who is part of the OpenMP language committee).

OpenMP was designed as a lightweight data-parallel addition to FORTRAN and C, later extended to C++ idioms (e.g. parallel loops over random-access iterators) and to task parallelism with the introduction of explicit tasks. It is meant to be as portable across as many platforms as possible and to provide essentially the same functionality in all three languages. Its execution model is quite simple - a single-threaded application forks teams of threads in parallel regions, runs some computational tasks inside and then joins the teams back into serial execution. Each thread from a parallel team can later fork its own team if nested parallelism is enabled.

Since the main usage of OpenMP is in High Performance Computing (after all, its directive and execution model was borrowed from High Performance Fortran), the main goal of any OpenMP implementation is efficiency and not interoperability with other threading paradigms. On some platforms efficient implementation could only be achieved if the OpenMP run-time is the only one in control of the process threads. Also there are certain aspects of OpenMP that might not play well with other threading constructs, for example the limit on the number of threads set by OMP_THREAD_LIMIT when forking two or more concurrent parallel regions.

Since the OpenMP standard itself does not strictly forbid using other threading paradigms, but neither standardises the interoperability with such, supporting such functionality is up to the implementers. This means that some implementations might provide safe concurrent execution of top-level OpenMP regions, some might not. The x86 implementers pledge to supporting it, may be because most of them are also proponents of other execution models (e.g. Intel with Cilk and TBB, GCC with C++11, etc.) and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).

OpenMP 4.0 is also not going further than ISO/IEC 14882:1998 for the C++ features it employs (the SC12 draft is here). The standard now includes things like portable thread affinity - this definitely does not play well with other threading paradigms, which might provide their own binding mechanisms that clash with those of OpenMP. Once again, the OpenMP language is targeted at HPC (data and task parallel scientific and engineering applications). The C++11 constructs are targeted at general purpose computing applications. If you want fancy C++11 concurrent stuff, then use C++11 only, or if you really need to mix it with OpenMP, then stick to the C++98 subset of language features if you want to stay portable.

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

There are no obvious reasons for what you want to not be possible, but it is up to your OpenMP compiler and run-time. There are free and commercial libraries that use OpenMP for parallel execution (for example MKL), but there are always warnings (although sometimes hidden deeply in their user manuals) of possible incompatibility with multithreaded code that give information on what and when is possible. As always, this is outside of the scope of the OpenMP standard and hence YMMV.

Upvotes: 27

Can I safely use OpenMP with C++11?

Answers (5)

Related Questions