tomahna
tomahna

Reputation: 57

Using omp parallel in a nested loop crash due to concurrent writing?

I am trying to parallelise a code that I am using since a while without parellelisation and without issues in a relatively large C++ project (compilation made on C++11). The code heavily rely on the Eigen package (version 3.3.9 used here). Here is a minimalistic version of the code that has to be parallelised and for which I get crashes (I hope not having introduced errors when squashing things...):

The main function with the two for loop that need to be parallelised:

Data_eigensols solve(const VectorXd& nu_l0_in, const int el, const long double delta0l, 
    const long double DPl, const long double alpha, const long double q, const long double sigma_p, 
        const long double resol){

const int Nmmax=5000;

int np, ng, s, s0m;
double nu_p, nu_g;
VectorXi test;
VectorXd nu_p_all, nu_g_all, nu_m_all(Nmmax);
Data_coresolver sols_iter;
Data_eigensols nu_sols;

//----
// Some code that initialize several variables (that I do not show here for clarity).
// This includes initialising freq_max, freq_min, tol, nu_p_all, nu_g_all, deriv_p and deriv_g structures
//----

// Problematic loops that crash with omp parallel but works fine when not using omp
s0m=0;
#pragma omp parallel for collapse(2) num_threads(4) default(shared) private(np, ng)
        for (np=0; np<nu_p_all.size(); np++)
        {
                for (ng=0; ng<nu_g_all.size();ng++)
                {
                        nu_p=nu_p_all[np];
                        nu_g=nu_g_all[ng];
   
                        sols_iter=solver_mm(nu_p, nu_g); // Depends on several other function but for clarity, I do not show them here (all those 'const long double' or 'const int')
                        //printf("np = %d, ng= %d, threadId = %d \n", np, ng, omp_get_thread_num());
                        for (s=0;s<sols_iter.nu_m.size();s++)
                        {
                                // Cleaning doubles: Assuming exact matches or within a tolerance range
                                if ((sols_iter.nu_m[s] >= freq_min) && (sols_iter.nu_m[s] <= freq_max))
                                {
                                        test=where_dbl(nu_m_all, sols_iter.nu_m[s], tol, 0, s0m); // This function returns -1 if there is no match for the condition (here means that sols_iter.nu_m[s] is not found in nu_m_all)
                                        if (test[0] == -1) // Keep the solution only if it was not pre-existing in nu_m_all
                                        {
                                                nu_m_all[s0m]=sols_iter.nu_m[s];
                                                s0m=s0m+1;
                                        }
                                }
                        }              
                }
        }
    nu_m_all.conservativeResize(s0m); // Reduced the size of nu_m_all to its final size

    nu_sols.nu_m=nu_m_all;
    nu_sols.nu_p=nu_p_all;
    nu_sols.nu_g=nu_g_all;
    nu_sols.dnup=deriv_p.deriv; 
    nu_sols.dPg=deriv_g.deriv;

return, nu_sols;
}

The types Data_coresolver and Data_eigensols are defined as:

struct Data_coresolver{
    VectorXd nu_m, ysol, nu,pnu, gnu; 
};

struct Data_eigensols{
    VectorXd nu_p, nu_g, nu_m, dnup, dPg;
};

The where_dbl() is as follows:

VectorXi where_dbl(const VectorXd& vec, double value, const double tolerance){
/*
 * Gives the indexes of values of an array that match the value.
 * A tolerance parameter allows you to control how close the match
 * is considered as acceptable. The tolerance is in the same unit
 * as the value
 *
*/
   int cpt;
   VectorXi index_out;
  
   index_out.resize(vec.size());
    
    cpt=0;
    for(int i=0; i<vec.size(); i++){
        if(vec[i] > value - tolerance && vec[i] < value + tolerance){
            index_out[cpt]=i;
            cpt=cpt+1;
        }       
    }
    if(cpt >=1){
        index_out.conservativeResize(cpt);
    } else{
        index_out.resize(1);
        index_out[0]=-1;
    }
    return index_out;
}

Regarding the solver_mm(): I don't details this function as it calls few subroutines and may be too long to show here and I don't think it is relevant here. It is basically a function that search to solve an implicit equation.

What the main function is supposed to do: The main function solve() calls iteratively solver_mm() in order to solve an implicit equation under different conditions, where the only variables are nu_p and nu_g. Sometimes solutions of a pair (nu_p(i),nu_g(j)) leads to duplicate solution than another pair (nu_p(k), nu_g(l)). This is why there is a section calling where_dbl() to detect those duplicated solution and throw them, keeping only unique solutions.

What is the problem: Without the #pragma call, the code works fine. But it fails at random point of the execution with it. After few tests, it seems that the culprit is somewhat related to the part that remove duplicate solutions. My guess is that there is a concurrent writing on the nu_m_all VectorXd. I tried to use the #pragma omp barrier without success. But I am quite new to omp and I may have misunderstood how the barrier works.

Can someone let me know why I have a crash here and how to solve it? The solution might be obvious for some person with good experience in omp.

Upvotes: 1

Views: 305

Answers (1)

Noureddine
Noureddine

Reputation: 190

nu_p, nu_g, sols_iter and test should be private. Since these variables are declared as shared, multiple threads might write in the same memory region in a non thread safe manner. This might be your problem.

Upvotes: 2

Related Questions