Reputation: 2758
I have the following code that I want to parallelize:
int ncip( int dim, double R)
{
int i;
int r = (int)floor(R);
if (dim == 1)
{
return 1 + 2*r;
}
int n = ncip(dim-1, R); // last coord 0
#pragma omp parallel for
for(i=1; i<=r; ++i)
{
n += 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
}
return n;
}
The program execution time when ran without openmp is 6.956s when I try and parallelize the for loop my execution time is greater than 3 minutes (and that's because I ended it myself). What am I doing wrong in regards to parallelizing this code ?
second attempt
int ncip( int dim, double R)
{
int i;
int r = (int)floor( R);
if ( dim == 1)
{ return 1 + 2*r;
}
#pragma omp parallel
{
int n = ncip( dim-1, R); // last coord 0
#pragma omp for reduction (+:n)
for( i=1; i<=r; ++i)
{
n += 2*ncip( dim-1, sqrt( R*R - i*i) ); // last coord +- i
}
}
return n;
}
Upvotes: 3
Views: 605
Reputation: 771
You are doing that wrong!
(1) There are data races in variable n
. If you want to parallelize a code that have writes in the same memory zone, you must use the reduction (in the for), atomic or critical to avoid data hazards.
(2) Probably you have the nested parallelism enabled, so the program is creating a new parallel zone every time you call the function ncip
. Should be this the main problem. For recursive functions I advise you to create just one parallel zone and then use the pragma omp task
.
Do not parallelize with #pragma omp for
and try with the #pragma omp task
. Look this example:
int ncip(int dim, double R){
...
#pragma omp task
ncip(XX, XX);
#pragma omp taskwait
...
}
int main(int argc, char *argv[]) {
#pragma omp parallel
{
#pragma omp single
ncip(XX, XX);
}
return(0);
}
UPDATE:
//Detailed version (without omp for and data races)
int ncip(int dim, double R){
int n, r = (int)floor(R);
if (dim == 1) return 1 + 2*r;
n = ncip(dim-1, R); // last coord 0
for(int i=1; i<=r; ++i){
#pragma omp task
{
int aux = 2*ncip(dim-1, sqrt(R*R - i*i) ); // last coord +- i
#pragma omp atomic
n += aux;
}
}
#pragma omp taskwait
return n;
}
PS: You'll not get a speedup from this, because overhead to creat a task is bigger than the work of a single task. The best thing you can do is re-write this algorithm to an iterative version, and then try to parallelize it.
Upvotes: 5