kiriloff
kiriloff

Reputation: 26335

openMP - concurrent access to variables and atomic

I want to parallelize with openMP a function that is sampling a box (selecting points randomly in a box, and evaluating a given function at these points). I wrote the following code.

//storing points
double** points_ = new double*[N-m];
for(int i=0;i<N-m;i++)
{
    points_[i]=new double[ndim];
}
double* evals_ = new double[N-m];

#pragma omp parallel for 
for(int i=0;i<N-m;i++)
{
    double* pt_ = randomPoint(lower,upper);
    for(int k=0;k<ndim;k++)
    {
        points_[i][k]=pt_[k];
    }
    evals_[i]=evalFunc(pt_);
    delete pt_;
}

However, I am not confident with this code: evals_ and points_ are updated in eauch thread. I think of adding some atomic statements there:

#pragma omp parallel for 
for(int i=0;i<N-m;i++)
{
    double* pt_ = randomPoint(m_lower,m_upper);
    for(int k=0;k<m_ndim;k++)
    {
        #pragma omp atomic update
        points_[i][k]=pt_[k];
    }
    #pragma omp atomic update
    evals_[i]=evalFunc(pt_);
    delete pt_;
}

but I fear that this would be very unefficient: do you have some advice to write it more accurately? And... this is not compiling... (error: expression following #pragma omp atomic has improper form) althought I can find that examlpe in openMP specs, A22

void atomic_example(float *x, float *y, int *index, int n)
{
    int i;
    #pragma omp parallel for shared(x, y, index, n)
    for (i=0; i<n; i++) {
    #pragma omp atomic update
    x[index[i]] += work1(i);
    y[i] += work2(i);
    }
}

and atomic update is also followed here by an affectation to an array.

Thanks and regards.

EDIT--------

I agree with Tudor's answer. However, it seems that this example here, in another parallelized piece of code does need atomic: on line sum_+=..., an error occurs (concurrent access)

for(i=0;i<m_ndim;i++)
{
    double sum_=0;
    #pragma omp parallel reduction(+:sum_)
    for(j=0;j<m_npts;j++)
    {
        sum_ += set_[j][i];
    }
    Sum_[i] = sum_;
}

Why is it then needed? Or is smthing else wrong?

Upvotes: 4

Views: 1737

Answers (1)

Tudor
Tudor

Reputation: 62439

You don't need any atomic clauses in your code.

The reason is the fact that the outer loop is split over the index i, so each thread will get a set of elements from points_ and eval_ which do not overlap with another thread's working set.

evals_ is an array, so each thread will get a contiguous subarray (due to the implicit static scheduling scheme), e.g.

0 1 2 3 . 4 5 6 7 . 8 9 10 11...
   t1        t2        t3

points_ is a bidimensional matrix and each thread will get a contiguous set of rows:

   0 
   1
t1 2
   3
   .
   4
   5
t2 6
   7
   .
   8
   9
t3 10
   11
   ...

In this second case it would appear that you have an overlap on the value of k, because each thread has the same range for k, but the updated points fall on different rows (the index i), which, as shown above, do not overlap for different threads.

Upvotes: 3

Related Questions