MA19
MA19

Reputation: 580

Segmentation fault when running an OpenMP parallel for loop

For the following code, I only want to parallelize its last part which calculates the second norm of each vector (length of each vector is different) but I am getting an error of segmentation fault. Also, I am not sure whether I am using reduction for the sum in the right place or not.

Another point is that I think I only need to parallelize the outer loop and there is no need to do this for an inner loop. Right?

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <omp.h>
#include <fstream>
#include <cfloat>
#include <chrono>
using namespace std;

int main()
{
    int N = 1000000;       
    unsigned size;
    vector<vector<double>> c;
    default_random_engine g(0);
    uniform_real_distribution<double> d(0.0f, nextafter(1.0f, DBL_MAX));
    vector<double> b;
    
    for (int i = 0; i < N; i++) {
        size = pow(10, i % 4);
        vector<double> a;

        for (int j = 0; j < size; j++) {
            double number = d(g);
            a.push_back(number);
        }

        c.push_back(a);
    }       
    
    int i, j;
    double sum; 

    #pragma omp parallel for num_threads(4) shared(N) private(i,j,c,b) reduction (+: sum)
    for (int i = 0; i <N ; i++) {
       double sum = 0;
       for (int j = 0; j < c[i].size();j++) {
           sum = sum + pow(c[i][j],2);
       }

       double n = sqrt(sum);
       b.push_back(n);
    }
}

Upvotes: 1

Views: 2105

Answers (1)

J&#233;r&#244;me Richard
J&#233;r&#244;me Richard

Reputation: 50956

The segmentation fault is caused by the private clause which does not copy the vectors. It initializes them to default empty vectors. Use firstprivate instead if you want to perform a copy from the "master" thread. That being said, c can be shared here.

Furthermore, here is several important points:

  • sum must be initialized to 0 (outside the loop);
  • the sum variable in the scope of the parallel loop shadow the sum variable outside it (the same also apply for i and j);
  • there is no need to declare a local sum, OpenMP do it for you;
  • you can move a to avoid unnecessary copies and reserve its size before using it (faster);
  • N does not need to be shared between thread (it is better to perform a local copy);
  • since b is private, adding value into it is useless unless it is read locally in each thread (it depends of what you want to do). If you want to read b outside the parallel region, you either need to add a critical section, to merge the thread-local vector parts manually (faster), or to use direct assignment (simplest solution and probably the fastest here).

Here is the corrected code:

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <omp.h>
#include <fstream>
#include <cfloat>
#include <chrono>
using namespace std;

int main()
{
    const int N = 1000000;
    vector<vector<double>> c;
    default_random_engine g(0);
    uniform_real_distribution<double> d(0.0f, nextafter(1.0f, DBL_MAX));
    c.reserve(N);

    for (int i = 0; i < N; i++) {
        const unsigned size = pow(10, i % 4);
        vector<double> a;
        a.reserve(size);

        for (int j = 0; j < size; j++) {
            const double number = d(g);
            a.push_back(number);
        }

        c.push_back(std::move(a));
    }

    double sum = 0.0;
    vector<double> b(N);

    #pragma omp parallel num_threads(4) firstprivate(N) shared(b,c,sum)
    {
        #pragma omp for reduction(+:sum)
        for (int i = 0; i < N ; i++) {
            double sumLocal = 0.0;

            for (int j = 0; j < c[i].size();j++) {
                sumLocal += pow(c[i][j], 2);
            }

            const double n = sqrt(sumLocal);
            b[i] = n;

            sum += sumLocal;
        }
    }
}

Upvotes: 3

Related Questions