Reputation: 119
I'm testing out a basic openmp parallelized code with a mex file. The trouble is that it seems to be running just one thread despite my direction to run it with 2 threads. Here is the code:
#include "mex.h"
#include "omp.h"
#include <iostream>
void mexFunction(int nlhs, mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
using namespace std;
#define x_out plhs[0]
#define x_in prhs[0]
double *x;
double y;
x_out=mxCreateDoubleMatrix(1,1,mxREAL);
x=mxGetPr(x_out);
y=mxGetScalar(x_in);
x[0]=y;
omp_set_num_threads(2);
int Nthreads=omp_get_num_threads();
cout<<Nthreads<<"\n";
#pragma omp parallel
{
int ithread=omp_get_thread_num();
#pragma omp for
for (int i=0;i<10;i++)
cout<<"Hello! " <<i<<"\n";
}
return;
}
I use the following compile line -
mex -v paralletestmex.cpp CC=g++ CFLAGS="\$CFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
and in the verbose description the fopenmp flag shows up so I guess it is compiling it to be parallel.
The output I get is -
1
Hello! 0
Hello! 1
Hello! 2
Hello! 3
Hello! 4
Hello! 5
Hello! 6
Hello! 7
Hello! 8
Hello! 9
Showing that for some reason, only 1 thread is being created. This is a simple test for a problem I'm facing on a more complex code. When I run this as a plain c++ file without using mex, the same code seems to work fine.
Any help is appreciated. Thank you! Siddharth
Upvotes: 3
Views: 633
Reputation: 38
I encountered the same issue and it turns out mex compilation commands are different for different platforms. For example, I am using Windows so the correct commands to set up OpenMP should be
COMPFLAGS="$COMPFLAGS /openmp" ...
LDFLAGS='$LDFLAGS /openmp' ... % Ensure OpenMP is linked
Refer to this page for more details.
Upvotes: 0
Reputation: 119
Ok, I did a fair bit of research, and it turns out that the CXXOPTIMFLAGS in the mexopts.sh file needs to be changed as well. So to the compile line I added:
CXXOPTIMFLAGS="\$CXXOPTIMFLAGS -fopenmp"
and that seems to do the job.
Thanks for all your help!
Upvotes: 2
Reputation: 74455
This is an extremely common mistake: omp_get_num_threads()
returns the number of threads in the current team. When called outside a parallel
region, it always returns 1 since by definition OpenMP programs execute with a single thread only (the master thread) outside of the parallel regions.
The complementary call to omp_set_num_threads()
is omp_get_max_threads()
.
Also note that calling omp_set_num_threads()
is a very bad programming practice when it comes to writing modules and library functions. The reason is that it fixes the number of threads for all parallel regions that follow and thus might affect other code. A much better way to do it is to use the num_threads
clause:
#pragma omp parallel num_threads(2)
{
// ...
}
Upvotes: 4