Reputation: 26335
I want to parallelize with openMP a function that is sampling a box (selecting points randomly in a box, and evaluating a given function at these points). I wrote the following code.
//storing points
double** points_ = new double*[N-m];
for(int i=0;i<N-m;i++)
{
points_[i]=new double[ndim];
}
double* evals_ = new double[N-m];
#pragma omp parallel for
for(int i=0;i<N-m;i++)
{
double* pt_ = randomPoint(lower,upper);
for(int k=0;k<ndim;k++)
{
points_[i][k]=pt_[k];
}
evals_[i]=evalFunc(pt_);
delete pt_;
}
However, I am not confident with this code: evals_ and points_ are updated in eauch thread. I think of adding some atomic statements there:
#pragma omp parallel for
for(int i=0;i<N-m;i++)
{
double* pt_ = randomPoint(m_lower,m_upper);
for(int k=0;k<m_ndim;k++)
{
#pragma omp atomic update
points_[i][k]=pt_[k];
}
#pragma omp atomic update
evals_[i]=evalFunc(pt_);
delete pt_;
}
but I fear that this would be very unefficient: do you have some advice to write it more accurately? And... this is not compiling... (error: expression following #pragma omp atomic has improper form) althought I can find that examlpe in openMP specs, A22
void atomic_example(float *x, float *y, int *index, int n)
{
int i;
#pragma omp parallel for shared(x, y, index, n)
for (i=0; i<n; i++) {
#pragma omp atomic update
x[index[i]] += work1(i);
y[i] += work2(i);
}
}
and atomic update is also followed here by an affectation to an array.
Thanks and regards.
EDIT--------
I agree with Tudor's answer. However, it seems that this example here, in another parallelized piece of code does need atomic: on line sum_+=..., an error occurs (concurrent access)
for(i=0;i<m_ndim;i++)
{
double sum_=0;
#pragma omp parallel reduction(+:sum_)
for(j=0;j<m_npts;j++)
{
sum_ += set_[j][i];
}
Sum_[i] = sum_;
}
Why is it then needed? Or is smthing else wrong?
Upvotes: 4
Views: 1737
Reputation: 62439
You don't need any atomic clauses in your code.
The reason is the fact that the outer loop is split over the index i
, so each thread will get a set of elements from points_
and eval_
which do not overlap with another thread's working set.
evals_
is an array, so each thread will get a contiguous subarray (due to the implicit static scheduling scheme), e.g.
0 1 2 3 . 4 5 6 7 . 8 9 10 11...
t1 t2 t3
points_
is a bidimensional matrix and each thread will get a contiguous set of rows:
0
1
t1 2
3
.
4
5
t2 6
7
.
8
9
t3 10
11
...
In this second case it would appear that you have an overlap on the value of k
, because each thread has the same range for k
, but the updated points fall on different rows (the index i
), which, as shown above, do not overlap for different threads.
Upvotes: 3