Deepak
Deepak

Reputation: 149

Unable to figure out where the race condition occuring In OPENMP program in c

I am trying to integrate sin(x) from 0 to pi. But every time i run
the program i am getting different outputs.I know it is because of race condition occuring , but i am unable to figure out where is the problem lies
this is my code:

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include<math.h>
#include<time.h> 
#define NUM_THREADS 4
static long num_steps= 10000000;



float rand_generator(float a )
{
    //srand((unsigned int)time(NULL));
     return ((float)rand()/(float)(RAND_MAX)) * a;
}



int main(int argc, char *argv[])
{
   // srand((unsigned int)time(NULL));
   omp_set_num_threads(NUM_THREADS);
   float result;
   float sum[NUM_THREADS];


   float area=3.14;
   int nthreads;

#pragma omp parallel

{

     int id,nthrds;

    id=omp_get_thread_num();
    sum[id]=0.0;
    printf("%d\n",id );
    nthrds=omp_get_num_threads();
    printf("%d\n",nthrds );
    //if(id==0)nthreads=nthrds;
    for (int i = id; i < num_steps; i=i+nthrds)
    {
        //float y=rand_generator(1);
        //printf("%f\n",y );
        float x=rand_generator(3.14);
        sum[id]+=sin(x);
    }
    //printf(" sum is:  %lf\n", sum);
    //float p=(float)sum/num_steps*area;

   }


   float p=0.0;     
   for (int i = 0; i <NUM_THREADS; ++i)
   {
   p+=(sum[i]/num_steps)*area;
   }

   printf(" p is: %lf\n",p );

   }

I tried adding pragma atomic but it also doesn't help.

Any help will be appreciated :).

Upvotes: 1

Views: 79

Answers (1)

Alain Merigot
Alain Merigot

Reputation: 11537

The problem comes from the use of rand(). rand() is not thread safe. The reason is that it uses a common state for all the calls and is thus sensitive to races. Using stdlib's rand() from multiple threads

There a thread safe random generator that is called rand_r(). Instead of storing the rand generator state in an hidden global var, the state is a parameter to the function and can be rendered thread local.

You can use it like that

float rand_generator_r(float a,unsigned int *state )
{
    //srand((unsigned int)time(NULL));
     return ((float)rand_r(state)/(float)(RAND_MAX)) * a;
}

In your parallel block, add :

 unsigned int rand_state=id*time(NULL); // or whatever thread dependent seed

and in your code call

   float x=rand_generator(3.14,&rand_state);

and it should work.

By the way, I have the impression that there is a false sharing in your code that should slow down performances.

 float sum[NUM_THREADS];

It is modified by all threads and is really likely to be store in a single cache line. Every store (and there are many stores to it) will create an invalidate in all other caches and it may significantly slow down your performances.

You should insure that the values are in different cache lines with :

#define CACHE_LINE_SIZE 64
struct {
  float s;
  char padding[CACHE_LINE_SIZE - sizeof(float)];
} sum_nofalse_sharing[NUM_THREADS];

and in your code, accumulate in sum_nofalse_sharing[id].s

Alternatively, create a local sum in the parallel block and write its value to sum[id] at the end.

Upvotes: 3

Related Questions