BullyWiiPlaza
BullyWiiPlaza

Reputation: 19225

std::for_each with std::execution::par_unseq not working on GCC but working with MSVC

I wanted to parallelize a for loop and found out about std::for_each as well as its execution policies. Surprisingly it didn't parallelize when using GCC:

#include <iostream>
#include <algorithm>
#include <execution>
#include <chrono>
#include <thread>
#include <random>

int main() {
    std::vector<int> foo;
    foo.reserve(1000);
    for (int i = 0; i < 1000; i++) {
        foo.push_back(i);
    }

    std::for_each(std::execution::par_unseq,
                  foo.begin(), foo.end(),
                  [](auto &&item) {
                      std::cout << item << std::endl;
                      std::random_device dev;
                      std::mt19937 rng(dev());
                      std::uniform_int_distribution<std::mt19937::result_type> dist6(10, 100);
                      std::this_thread::sleep_for(std::chrono::milliseconds(dist6(rng)));
                      std::cout << "Thread ID: " << std::this_thread::get_id() << std::endl;
                  });
}

This code still runs sequentially.

Using MSVC the code is parallelized and finishes much quicker.

GCC:

$ gcc --version
gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MSVC:

>cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29112 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

CMakeLists.txt:

cmake_minimum_required(VERSION 3.17)
project(ParallelTesting)

set(CMAKE_CXX_STANDARD 20)

add_executable(ParallelTesting main.cpp)

Is there anything specific I need to do to enable parallelization with GCC as well?

ldd output of my binary:

$ ldd my_binary
    linux-vdso.so.1 (0x00007ffe9e6b9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f79efaa0000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f79ef881000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f79ef4ad000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f79ef295000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f79eeea4000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f79f041a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f79eeb06000)

The debug and release version of the binary overall have the same ldd output.

Upvotes: 9

Views: 3329

Answers (2)

Diomidis Spinellis
Diomidis Spinellis

Reputation: 19375

I had the same problem, and the answer by @BullyWiiPlaza helped me use the required library and also verify the compiler's operation.

One additional issue I faced was that the library considered the work I provided to for_each(execution::par_unseq, … too small for parallelizing. My assumption was that the library would arrange for the function to be called multiple times by each thread along different parts of the iterator sequence.

I solved the problem by creating larger chunks on my own.

typedef pair<micro_work_type::iterator, micro_work_type::iterator> work_type;

void
worker(work_type &be)
{
    for (auto v = be.first; v != be.second; v++)
        // Work on *v
}

[…]
        vector <work_type> chunks;
        auto pos = micro_work.begin();
        auto begin = pos;
        size_t i;
        for (i = 0; i < micro_work.size(); i++, pos++) {
            if (i > 0 && i % BATCH_SIZE == 0) {
                chunks.push_back(pair{begin, pos});
                begin = pos;
            }
        }
        if (i > 0 && i % BATCH_SIZE != 0)
            chunks.push_back(pair{begin, pos});

        for_each(execution::par_unseq, chunks.begin(), chunks.end(), worker);

Upvotes: 0

BullyWiiPlaza
BullyWiiPlaza

Reputation: 19225

I solved it by firstly upgrading my WSL Ubuntu distribution from version 18.04 to 20.04 since after running sudo apt install gcc libtbb-dev to install TBB I still got the following error: #error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported. This is caused by TBB being too old.

Now with TBB 2002.1-2 installed it's working as expected:

$ sudo apt install libtbb-dev
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
libtbb-dev is already the newest version (2020.1-2).
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.

This answer describes all the details very well.

Since I'm using CMake I also had to add the following line to my CMakeLists.txt:

# Link against the dependency of Intel TBB (for parallel C++17 algorithms)
target_link_libraries(${PROJECT_NAME} tbb)

Upvotes: 9

Related Questions