Reputation: 63
I have a code that runs many iterations and only if a condition is met, the result of the iteration is saved. This is naturally expressed as a while loop. I am attempting to make the code run in parallel, since each realisation is independent. So I have this:
while(nit<avit){
#pragma omp parallel shared(nit,avit)
{
//do some stuff
if(condition){
#pragma omp critical
{
nit++;
\\save results
}
}
}//implicit barrier here
}
and this works fine... but there is a barrier after each realization, which means that if the stuff I am doing inside the parallel block takes longer in one iteration than the others, all my threads are waiting for it to finish, instead of continuing with the next iteration.
Is there a way to avoid this barrier so that the threads keep working? I am averaging thousands of iterations, so a few more don't hurt (in case the nit
variable has not been incremented in already running threads)...
I have tried to turn this into a parallel for, but the automatic increment in the for loop makes the nit
variable go wild. This is my attempt:
#pragma omp parallel shared(nit,avit)
{
#pragma omp for
for(nit=0;nit<avit;nit++){
//do some stuff
if(condition){
\\save results
} else {
#pragma omp critical
{
nit--;
}
}
}
}
and it keeps working and going around the for loop, as expected, but my nit
variable takes unpredictable values... as one could expect from the increase and decrease of it by different threads at different times.
I have also tried leaving the increment in the for loop blank, but it doesn't compile, or trying to trick my code to have no increment in the for loop, like
...
incr=0;
for(nit=0;nit<avit;nit+=incr)
...
but then my code crashes...
Any ideas?
Thanks
Edit: Here's a working minimal example of the code on a while loop:
#include <random>
#include <vector>
#include <iostream>
#include <time.h>
#include <omp.h>
#include <stdlib.h>
#include <unistd.h>
using namespace std;
int main(){
int nit,dit,avit=100,t,j,tmax=100,jmax=10;
vector<double> Res(10),avRes(10);
nit=0; dit=0;
while(nit<avit){
#pragma omp parallel shared(tmax,nit,jmax,avRes,avit,dit) private(t,j) firstprivate(Res)
{
srand(int(time(NULL)) ^ omp_get_thread_num());
t=0; j=0;
while(t<tmax&&j<jmax){
Res[j]=rand() % 10;
t+=Res[j];
if(omp_get_thread_num()==5){
usleep(100000);
}
j++;
}
if(t<tmax){
#pragma omp critical
{
nit++;
for(j=0;j<jmax;j++){
avRes[j]+=Res[j];
}
for(j=0;j<jmax;j++){
cout<<avRes[j]/nit<<"\t";
}
cout<<" \t nit="<<nit<<"\t thread: "<<omp_get_thread_num();
cout<<endl;
}
} else{
#pragma omp critical
{
dit++;
cout<<"Discarded: "<<dit<<"\r"<<flush;
}
}
}
}
return 0;
}
I added the usleep
part to simulate one thread taking longer than the others. If you run the program, all threads have to wait for thread 5 to finish, and then they start the next run. what I am trying to do is precisely to avoid such wait, i.e. I'd like the other threads to pick the next iteration without waiting for 5 to finish.
Upvotes: 5
Views: 16938
Reputation: 22670
You can basically follow the same concept as for this question, with a slight variation to ensure that avRes
is not written to in parallel:
int nit = 0;
#pragma omp parallel
while(1) {
int local_nit;
#pragma omp atomic read
local_nit = nit;
if (local_nit >= avit) {
break;
}
[...]
if (...) {
#pragma omp critical
{
#pragma omp atomic capture
local_nit = ++nit;
for(j=0;j<jmax;j++){
avRes[j] += Res[j];
}
for(j=0;j<jmax;j++){
// technically you could also use `nit` directly since
// now `nit` is only modified within this critical section
cout<<avRes[j]/local_nit<<"\t";
}
}
} else {
#pragma omp atomic update
dit++;
}
}
It also works with critical regions, but atomics are more efficient.
There's another thing you need to consider, rand()
should not be used in parallel contexts. See this question. For C++, use a private (i.e. defined within the parallel region) random number generator from <random>
.
Upvotes: 4