Reputation: 51
I am putting together a minimal example leveraging parallelism features in C++17/20 within Matlab MEX functions. I am able to compile and run the mex function from Matlab, but when I set the execution policy of my C++ STL function to "par" instead of "seq", Matlab gives a runtime linkage complaint. Code and error message follows:
test.m (Matlab top-level script):
vec_in = zeros(5);
coeff = 0.05;
vec_out = test_mex_gateway(vec_in, coeff);
test_mex_gateway.cpp (C++ interface to Matlab):
#include "mex.h"
extern void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff);
void mexFunction( int nlhs,
mxArray *plhs[],
int nrhs,
const mxArray *prhs[] )
{
// Check for proper number of input and output arguments
if( nrhs != 2 )
{
mexErrMsgTxt( "3 input arguments required: input_data, coeff" );
}
if( nlhs > 2 )
{
mexErrMsgTxt( "Too many output arguments." );
}
const mwSize *matlab_data_dims_in;
mwSize matlab_data_dims_out[1];
// Input Parameters
float *input_data = (float *) mxGetData(prhs[0]);
float coeff = mxGetScalar(prhs[1]);
// Get dimensions
matlab_data_dims_in = mxGetDimensions(prhs[0]);
const int vec_len = matlab_data_dims_in[1];
// Set output data dimension
matlab_data_dims_out[0] = vec_len;
// Output data
plhs[0] = mxCreateNumericArray(1, matlab_data_dims_out, mxSINGLE_CLASS, mxREAL);
float *output_data = (float *) mxGetData(plhs[0]);
test_execute(input_data, output_data, vec_len, coeff);
}
test_execute.cpp (This is where the actual C++ STL call is made):
#include <execution> // std::execution::*
#include <numeric> // std::exclusive_scan()
void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff)
{
std::exclusive_scan
(
std::execution::par, // std::execution::seq works here for Mex call, par does not
array_in,
array_in + vec_size,
array_out,
0.0f,
[coeff](float a, float b)
{
float ret = a + b + coeff;
return ret;
}
);
}
I also have a stand-alone main function to replace the Mex wrapper to do a pure C++ test, test_standalone.cpp:
#include <vector>
#include <iostream>
size_t VEC_NUM_ELEM = 10;
extern void test_execute(float *array_in, float *array_out, const size_t vec_size, const float coeff);
int main(int argc, char **argv)
{
if (argc != 2)
{
std::cout << "Try: " << argv[0] << "<coeff>" << std::endl;
return -1;
}
const float coeff = std::stof(argv[1]);
std::cout << "Coeff: " << coeff << std::endl;
float __attribute__ ((aligned (64))) *vec1_array = (float *)malloc(VEC_NUM_ELEM * sizeof(float));
float __attribute__ ((aligned (64))) *vec2_array = (float *)malloc(VEC_NUM_ELEM * sizeof(float));
for (unsigned i = 0; i < VEC_NUM_ELEM; i++)
{
vec1_array[i] = static_cast<float>(i);
}
test_execute(vec1_array, vec2_array, VEC_NUM_ELEM, coeff);
return 0;
}
Here is how I am building and linking, build.sh:
#!/bin/bash
rm *.o
rm *.exe
rm *.mexa64
cstd=c++17
gpp910=/home/m/compilers/bin/g++
tbblib=/home/m/reqs/tbb/lib/intel64/gcc4.8
echo "Building test_execute.cpp..."
$gpp910 -std=$cstd -I/home/m/reqs/tbb/include -L$tbblib -ltbb -Wl,rpath=$tbblib -c test_execute.cpp -fPIC
echo "Building test_standalone.cpp..."
$gpp910 -std=$cstd -L$tbblib test_execute.o test_standalone.cpp -o test_standalone.exe -ltbb
echo "Building test_mex_gateway.cpp..."
mex test_execute.o test_mex_gateway.cpp -L$tbblib -ltbb
The parallel STL calls has a requirement to link against the Intel TBB (Threading Building Blocks), so before I run Matlab to call test.m OR before I run my test_standalone.exe, I run:
export LD_LIBRARY_PATH=/home/m/reqs/tbb/lib/intel64/gcc4.8:$LD_LIBRARY_PATH
I also make sure to make the the C++ library associated with the version of GCC we built with available at runtime:
export LD_LIBRARY_PATH=/home/m/compilers/lib64:$LD_LIBRARY_PATH
When I run test_standalone.exe, everything behaves normally whether I have the execution policy set to "par" or "seq" on std::exclusive_scan. When run test.m, if "seq" was compiled, I can run with no errors. If "par" was compiled, Matlab complains at runtime about a linkage issue:
Invalid MEX-file 'test_mex_gateway.mexa64': test_mex_gateway.mexa64: undefined symbol: _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl
I suspect this was a function that was supposed to be linked from TBB, which I confirmed:
$ nm /home/m/reqs/tbb/lib/intel64/gcc4.8/libtbb.so.2 | grep baseEl
0000000000028a30 T _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl
000000000005ed70 r _ZN3tbb10interface78internal20isolate_within_arenaERNS1_13delegate_baseEl$$LSDA
I confirmed Matlab's LD_LIBRARY_PATH has the path I supplied in the above "export .." to this library.
I tried making sure my libraries came before the many Matlab-centric paths Matlab adds to LD_LIBRARY_PATH after it launches from the terminal.
I tried baking the path to the linked libraries via a -Wl,rpath=<path_to_tbb.so> passage to the linker.
After almost two days, I can't figure out why Matlab is having this very specific runtime issue, especially when the pure C++ version is not. Any help would be appreciated.
RHEL 7.9
Matlab R2020a
GCC 9.1.0
TBB (Intel Thread Building Blocks) 2020.3
Upvotes: 1
Views: 272
Reputation: 51
It appears that Matlab comes with a version of libtbb.so included in its installation. From what I can tell, when launching a Mex file, Matlab will use its own libraries first, regardless of your LD_LIBRARY_PATH order. This is what was giving me runtime issues as a Mex file but not as a pure C++ file. Removing the libtbb.so from Matlab's installation directory allowed runtime linkage to find my version of libtbb, and I was able to run without errors. Thanks to Cris Luengo for pointing me in the right direction.
Upvotes: 3