Itamar Katz
Itamar Katz

Reputation: 9645

std parallel algorithm seems to use only 1 thread

I am using C++17 parallel standard library algorithms with the std::execution::par execution policy. I am using Ubuntu on a laptop with 4 cores, clang 11 compiler and cmake extension for VS Code for the build (although I also checked with a simple single command line compilation without using cmake).

Based on the following observations, it seems the program only uses 1 thread:

  1. Run time is the same as with using std::execution::seq (regular, sequential algorithm)
  2. Using top -H I see only 1 thread with ~100% cpu usage
  3. Using Ubuntu's system monitor I see one core active during execution (but the active core may change between different calls to sort if I do repeats using a for loop).

Code example:

#include <vector>
#include <iostream>
#include <algorithm>
#include <execution>
#include <chrono>
#include <thread>

int main()
{
    const int N = 10000000;
    std::vector<int> vec(N);
    std::chrono::duration<double> elapsed;
    unsigned int nThreads = std::thread::hardware_concurrency();
    std::cout << "number of available threads: " << nThreads << "\n"; // this prints "4"

    auto tstart = std::chrono::high_resolution_clock::now();
    std::generate(vec.begin(), vec.end(), []() {return rand() % 100;});

    //std::sort(std::execution::seq, vec.begin(), vec.end());
    std::sort(std::execution::par, vec.begin(), vec.end());
    auto tfinish = std::chrono::high_resolution_clock::now();

    elapsed = tfinish - tstart;
    std::cout << "Elapsed time: " << elapsed.count() << std::endl;

    return 0;
}

I thought that maybe the problem was that I didn't tell cmake to link to pthread library. So I changed CMakeLists.txt:

project(my_proj LANGUAGES C CXX)
find_package (Threads)
target_link_libraries (my_proj ${CMAKE_THREAD_LIBS_INIT})

But it didn't make any change.

Why it doesn't seem to run in parallel?

Upvotes: 6

Views: 945

Answers (1)

kc9jud
kc9jud

Reputation: 402

As @ildjarn notes, libstdc++ requires Intel TBB for parallel execution policies:

Note 3: The Parallel Algorithms have an external dependency on Intel TBB 2018 or later. If the header is included then -ltbb must be used to link to TBB.

Depending on the version of the libstdc++ headers you are using, it may silently fall back to sequential execution if the TBB headers are not in the include path. One way to check if TBB is in the path is to try including one of the TBB headers explicitly:

#include <tbb/parallel_for.h>

Of course, that won't fail silently if TBB is not in your search path.

Upvotes: 1

Related Questions