Mex file executing in series despite parallel compile

I'm testing out a basic openmp parallelized code with a mex file. The trouble is that it seems to be running just one thread despite my direction to run it with 2 threads. Here is the code:

#include "mex.h"
#include "omp.h"

#include <iostream>


void mexFunction(int nlhs, mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
    using namespace std;
    #define x_out plhs[0]
    #define x_in prhs[0]

    double *x;
    double y;
    x_out=mxCreateDoubleMatrix(1,1,mxREAL);
    x=mxGetPr(x_out);
    y=mxGetScalar(x_in);

    x[0]=y;    
    omp_set_num_threads(2);
    int Nthreads=omp_get_num_threads();
    cout<<Nthreads<<"\n";
    #pragma omp parallel
    {
        int ithread=omp_get_thread_num();

        #pragma omp for
                for (int i=0;i<10;i++)
                    cout<<"Hello! " <<i<<"\n";
    }
    return;
}

I use the following compile line -

mex -v paralletestmex.cpp CC=g++ CFLAGS="\$CFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"

and in the verbose description the fopenmp flag shows up so I guess it is compiling it to be parallel.

The output I get is -

1
Hello! 0
Hello! 1
Hello! 2
Hello! 3
Hello! 4
Hello! 5
Hello! 6
Hello! 7
Hello! 8
Hello! 9

Showing that for some reason, only 1 thread is being created. This is a simple test for a problem I'm facing on a more complex code. When I run this as a plain c++ file without using mex, the same code seems to work fine.

Any help is appreciated. Thank you! Siddharth

Upvotes: 3

Views: 633

Answers (3)

Cold Enough
Cold Enough

Reputation: 38

I encountered the same issue and it turns out mex compilation commands are different for different platforms. For example, I am using Windows so the correct commands to set up OpenMP should be

COMPFLAGS="$COMPFLAGS /openmp" ...
LDFLAGS='$LDFLAGS /openmp' ...              % Ensure OpenMP is linked

Refer to this page for more details.

Upvotes: 0

Ok, I did a fair bit of research, and it turns out that the CXXOPTIMFLAGS in the mexopts.sh file needs to be changed as well. So to the compile line I added:

CXXOPTIMFLAGS="\$CXXOPTIMFLAGS -fopenmp" 

and that seems to do the job.

Thanks for all your help!

Upvotes: 2

Hristo Iliev
Hristo Iliev

Reputation: 74455

This is an extremely common mistake: omp_get_num_threads() returns the number of threads in the current team. When called outside a parallel region, it always returns 1 since by definition OpenMP programs execute with a single thread only (the master thread) outside of the parallel regions.

The complementary call to omp_set_num_threads() is omp_get_max_threads().

Also note that calling omp_set_num_threads() is a very bad programming practice when it comes to writing modules and library functions. The reason is that it fixes the number of threads for all parallel regions that follow and thus might affect other code. A much better way to do it is to use the num_threads clause:

#pragma omp parallel num_threads(2)
{
   // ...
}

Upvotes: 4

Related Questions