FrenchOwl
FrenchOwl

Reputation: 73

Clang, OpenMP and custom vector/matrix reduction

I had to use a homebrewed gcc until now to compile OMP-augmented code on my Mac.

Good news is, Apple Clang is now able to find the OMP headers (at least in its Apple LLVM version 9.1.0 (clang-902.0.39.2) version).

Bad news are that custom reduction clauses that used to work do not anymore. I've attached the code snippet below that demonstrates my problem. It immediately crashes upon entering the parallel block with either a segfault or the following error:

DebugOMP(46436,0x7fff8fc12380) malloc: *** error for object 0x7fff8fc02000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Is there a way to fix this reduction? Simpler OMP clauses like #pragma omp parallel for work fine. I am using Armadillo 9.100.5. The same issue happens with Eigen.

main.cpp :

#include <armadillo>

#pragma omp declare reduction( + : arma::vec : omp_out += omp_in ) \
initializer( omp_priv = omp_orig )


int main() {

    int N = 10000;
    int M = 100;
    double a = 0;

    // Built-in reduction, works
    #pragma omp parallel for reduction(+:a)
    for (int k = 0; k < M; ++k){
        a += k;
    }

    std::cout << a << std::endl;

    arma::vec v = arma::zeros<arma::vec>(M);

    // Parallel access, works
    #pragma omp parallel for
    for (int k = 0; k < M; ++k){
        v(k) = k;
    }

    std::cout << v << std::endl;

    // Custom, reduction, segfaults
    #pragma omp parallel for reduction(+:v)
    for (int i = 0; i < N; ++i){
        v += arma::ones<arma::vec>(v.n_rows);
    }

    std::cout << v << std::endl;

    return 0;
}

CMakeLists.txt:

cmake_minimum_required(VERSION 3.0.0)

# Building procedure
get_filename_component(dirName ${CMAKE_CURRENT_SOURCE_DIR} NAME)
set(EXE_NAME ${dirName} CACHE STRING "Name of executable to be created.")

project(${EXE_NAME})

# Find Armadillo 
find_package(Armadillo REQUIRED )
include_directories(${ARMADILLO_INCLUDE_DIRS})

# Find OpenMP
find_package(OpenMP)
if(OPENMP_FOUND)
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
endif()

# Add source files in root directory
add_executable(${EXE_NAME}
main.cpp)


# Linking
set(library_dependencies ${ARMADILLO_LIBRARIES} )


target_link_libraries(${EXE_NAME} ${library_dependencies} OpenMP::OpenMP_CXX)

Upvotes: 3

Views: 603

Answers (2)

Z boson
Z boson

Reputation: 33679

You can manually do the reduction like this

#pragma omp parallel
{
  arma::vec t = arma::zeros<arma::vec>(M);
  #pragma omp for nowait
  for (int i = 0; i < N; ++i) t += arma::ones<arma::vec>(v.n_rows);
  #pragma omp critical
  v += t;
}

That works with Clang. This could help you figure out how to define the initializer-expr

For example this works with GCC 7

#pragma omp declare reduction( + : arma::vec : omp_out += omp_in ) \
initializer( omp_priv = arma::zeros<arma::vec>(omp_orig.n_rows))

However with Clang 5.0 the code hangs so I'm not sure what Clang's problem is. I tried other initializer-expr variations but none of them got Clang to work.


I installed clang7 and the OP's code works fine. In general I think it's a better idea to be explicitly set the vector to zero like this

initializer( omp_priv = arma::zeros<arma::vec>(omp_orig.n_rows))

rather than implicitly like this

initializer(omp_priv = omp_orig)

because the implicit case assumes the constructor initializes to zero.

Upvotes: 1

ggael
ggael

Reputation: 29205

Since the original question also asked about Eigen, here is a self-contained example working with gcc 5, 6, 7, 8 and clang 6. It segfaults with clang 5, likely a bug on clang 5 side. This is essentially the same solution as proposed by Z boson.

#include <Eigen/Core>
#include <iostream>
using namespace Eigen;

typedef VectorXd vec;

#pragma omp declare reduction( + : vec : omp_out += omp_in ) \
  initializer( omp_priv = vec::Zero(omp_orig.size()) )

int main() {
    int N = 10000;
    int M = 100;
    vec v = vec::LinSpaced(M,0,M-1);

    #pragma omp parallel for reduction(+:v)
    for (int i = 0; i < N; ++i){
        v += vec::Ones(v.size());
    }

    std::cout << v.transpose() << std::endl;
    return 0;
}

Upvotes: 0

Related Questions