mfeuling
mfeuling

Reputation: 51

Am I able to use parallel STL algorithms from C++17/C++20 in Matlab MEX functions?

I am putting together a minimal example leveraging parallelism features in C++17/20 within Matlab MEX functions. I am able to compile and run the mex function from Matlab, but when I set the execution policy of my C++ STL function to "par" instead of "seq", Matlab gives a runtime linkage complaint. Code and error message follows:

test.m (Matlab top-level script):

vec_in = zeros(5);
coeff = 0.05;

vec_out = test_mex_gateway(vec_in, coeff);

test_mex_gateway.cpp (C++ interface to Matlab):

#include "mex.h"

extern void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff);

void mexFunction( int nlhs,
                  mxArray *plhs[],
                  int nrhs,
                  const mxArray *prhs[] )
{
    // Check for proper number of input and output arguments
    if( nrhs != 2 )
    {
        mexErrMsgTxt( "3 input arguments required: input_data, coeff" );
    }

    if( nlhs > 2 )
    {
        mexErrMsgTxt( "Too many output arguments." );
    }

    const mwSize *matlab_data_dims_in;
    mwSize matlab_data_dims_out[1];

    // Input Parameters
    float *input_data = (float *) mxGetData(prhs[0]);
    float coeff = mxGetScalar(prhs[1]);

    // Get dimensions
    matlab_data_dims_in = mxGetDimensions(prhs[0]);
    const int vec_len = matlab_data_dims_in[1];

    // Set output data dimension
    matlab_data_dims_out[0] = vec_len;

    // Output data
    plhs[0] = mxCreateNumericArray(1, matlab_data_dims_out, mxSINGLE_CLASS, mxREAL);
    float *output_data = (float *) mxGetData(plhs[0]);

    test_execute(input_data, output_data, vec_len, coeff);

}

test_execute.cpp (This is where the actual C++ STL call is made):

#include <execution> // std::execution::*
#include <numeric>   // std::exclusive_scan()

void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff)
{
    std::exclusive_scan
    (
        std::execution::par, // std::execution::seq works here for Mex call, par does not
        array_in,
        array_in + vec_size,
        array_out,
        0.0f,
        [coeff](float a, float b)
        {
            float ret = a + b + coeff;
            return ret;
        }
    );
}

I also have a stand-alone main function to replace the Mex wrapper to do a pure C++ test, test_standalone.cpp:

#include <vector>
#include <iostream>

size_t VEC_NUM_ELEM = 10;

extern void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff);

int main(int argc, char **argv)
{
    if (argc != 2)
    {
        std::cout << "Try: " << argv[0] << "<coeff>" << std::endl;
        return -1;
    }

    const float coeff = std::stof(argv[1]);

    std::cout << "Coeff: " << coeff << std::endl;

    float __attribute__ ((aligned (64))) *vec1_array = (float *)malloc(VEC_NUM_ELEM * sizeof(float));
    float __attribute__ ((aligned (64))) *vec2_array = (float *)malloc(VEC_NUM_ELEM * sizeof(float));

    for (unsigned i = 0; i < VEC_NUM_ELEM; i++)
    {
        vec1_array[i] = static_cast<float>(i);
    }

    test_execute(vec1_array, vec2_array, VEC_NUM_ELEM, coeff);

    return 0;
}

Here is how I am building and linking, build.sh:

#!/bin/bash

rm *.o
rm *.exe
rm *.mexa64

cstd=c++17

gpp910=/home/m/compilers/bin/g++

tbblib=/home/m/reqs/tbb/lib/intel64/gcc4.8

echo "Building test_execute.cpp..."
$gpp910 -std=$cstd -I/home/m/reqs/tbb/include -L$tbblib -ltbb -Wl,rpath=$tbblib -c test_execute.cpp -fPIC

echo "Building test_standalone.cpp..."
$gpp910 -std=$cstd -L$tbblib test_execute.o test_standalone.cpp -o test_standalone.exe -ltbb

echo "Building test_mex_gateway.cpp..."
mex test_execute.o test_mex_gateway.cpp -L$tbblib -ltbb

The parallel STL calls has a requirement to link against the Intel TBB (Threading Building Blocks), so before I run Matlab to call test.m OR before I run my test_standalone.exe, I run:

export LD_LIBRARY_PATH=/home/m/reqs/tbb/lib/intel64/gcc4.8:$LD_LIBRARY_PATH

I also make sure to make the the C++ library associated with the version of GCC we built with available at runtime:

export LD_LIBRARY_PATH=/home/m/compilers/lib64:$LD_LIBRARY_PATH

When I run test_standalone.exe, everything behaves normally whether I have the execution policy set to "par" or "seq" on std::exclusive_scan. When run test.m, if "seq" was compiled, I can run with no errors. If "par" was compiled, Matlab complains at runtime about a linkage issue:

Invalid MEX-file 'test_mex_gateway.mexa64': test_mex_gateway.mexa64: undefined symbol: _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl

I suspect this was a function that was supposed to be linked from TBB, which I confirmed:

$ nm /home/m/reqs/tbb/lib/intel64/gcc4.8/libtbb.so.2 | grep baseEl

0000000000028a30 T _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl

000000000005ed70 r _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl$$LSDA

I confirmed Matlab's LD_LIBRARY_PATH has the path I supplied in the above "export .." to this library.

I tried making sure my libraries came before the many Matlab-centric paths Matlab adds to LD_LIBRARY_PATH after it launches from the terminal.

I tried baking the path to the linked libraries via a -Wl,rpath=<path_to_tbb.so> passage to the linker.

After almost two days, I can't figure out why Matlab is having this very specific runtime issue, especially when the pure C++ version is not. Any help would be appreciated.

RHEL 7.9

Matlab R2020a

GCC 9.1.0

TBB (Intel Thread Building Blocks) 2020.3

Upvotes: 1

Views: 272

Answers (1)

mfeuling
mfeuling

Reputation: 51

It appears that Matlab comes with a version of libtbb.so included in its installation. From what I can tell, when launching a Mex file, Matlab will use its own libraries first, regardless of your LD_LIBRARY_PATH order. This is what was giving me runtime issues as a Mex file but not as a pure C++ file. Removing the libtbb.so from Matlab's installation directory allowed runtime linkage to find my version of libtbb, and I was able to run without errors. Thanks to Cris Luengo for pointing me in the right direction.

Upvotes: 3

Related Questions