Hải Phú Vũ
Hải Phú Vũ

Reputation: 31

Implement program Monte Carlo calculation pi use serial and multithread

When i implement solution for Monte Carlo method to calculate pi number, in which the circle with center has the coordinates (0,0), the radius is 1 and is inscribed in the square, the given formula is as follows: pi = 4 * (number of points in circle) / (total number of points ). I implemented the solution using 2 programs serial.c (single process) and multithread.c (multi-thread process) to compare the execution speed of these two solutions.

Program serial.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>

void circle_point();
static long int total_point;
static long int count_circle=0;
clock_t start_time, end_time;

int main(int argc, char *argv[]){
    if(argc==1){
       printf("Enter number point\n");
       return -1;
    }

    if(argc!=2){
       printf("Argument is wrong\n");
       return -1;
    }
   total_point=atoll(argv[1]);

   start_time=clock();
   circle_point();
   double pi=4.0*(double)count_circle/(double)total_point;
   end_time=clock();

   printf("PI = %17.15f\n",pi);
   printf("Time to compute= %g second\n",(double)(end_time-start_time)/CLOCKS_PER_SEC);
   return 0;
}

void circle_point(){
   srand(time(NULL));
   int i;
   for(i=0;i<total_point;i++){
       double x= (double)rand()/(double)RAND_MAX;
       double y=(double)rand()/(double)RAND_MAX;
       double r= sqrt(x*x+y*y);
       if(r<=1) count_circle+=1;
    }
}

Program multithread.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <pthread.h>

#define NUM_THREAD 4

void* circle_point(void *param);

pthread_t tid[NUM_THREAD]={0};
int count[NUM_THREAD]={0};
clock_t start_time, end_time;
static long int total_point;
static long int count_circle=0;


int main(int argc, char const *argv[]){
   if(argc==1){
       printf("Enter number point\n");
       return -1;
      }

   if(argc!=2){
       printf("Argument is wrong\n");
       return -1;
     }

   total_point=atoll(argv[1])/NUM_THREAD;

   start_time= clock();
   srand(time(NULL));
   static int i;
   for(i=0; i<NUM_THREAD;i++)
   pthread_create(&tid[i],NULL,circle_point,&count[i]);
   for(i=0;i<NUM_THREAD;i++){
       pthread_join(tid[i],NULL);
       count_circle+=count[i];
     }
   double pi=4.0*(double)count_circle/(double)total_point/(double)NUM_THREAD;
   end_time=clock();

   printf("PI = %17.15f\n",pi);
   printf("Time to compute= %g second\n",(double)(end_time-start_time)/CLOCKS_PER_SEC);
   return 0;
  }

 void* circle_point(void *param){
   int *pcount= (int*)param;
   int i;
   for(i=0; i<total_point;i++){
       double x= (double)rand()/(double)RAND_MAX;
       double y=(double)rand()/(double)RAND_MAX;
       double r= sqrt(x*x+y*y);
       if(r<=1) *pcount=*pcount+1;
     }
   pthread_exit(0);
 }

The problem I encountered is that with the same total number of points input, for example 10^7, the execution speed of the serial is 0.45 seconds while the execution speed of multithread is 6.06 seconds, similar to other inputs, the execution time of the serial is always greater than the multithread. I do not understand what problems I have encountered? Because in theory, multithread will execute faster than serial, so is there any case in fact that serial execute faster than multithread as I have encountered.

Upvotes: 2

Views: 2989

Answers (2)

Chris Hall
Chris Hall

Reputation: 1772

You should seriously consider using something better than rand() for Monte Carlo.

But for multi-threaded you should at least use rand_r(). POSIX says:

The rand() function need not be thread-safe.

From a performance perspective, the rand() in all the threads is reading/writing the same state, which is bad -- and will be worse if it is doing so thread-safely !

But also, as @Hitokiri says, you need to divide the total_point count by the number of threads to see a speed-up !

You want to make sure that the threads interact as little as possible. In pcount you have a pointer to an entry in the int count[NUM_THREAD] array, and every time around the inner loop you *pcount=*pcount+1. In principle this means that every time around the loop the threads are competing for write access to the cache-line which contains each one's counter. It may be that the compiler is clever enough to spot that, and hoist the actual memory reads/writes out of the loop. But if I were you, I would declare a count local to circle_point() and: count += (r <= 1.0) ; in the loop -- writing the result to *pcount at the end.

I expect the compiler also loads total_point into a register at the start of the loop.

However, in general, you want to be very clear what variables are shared by threads, and how that is achieved.

As it happens:

  • total_point is only read by the threads, which keeps things simple.

  • total_point is written (initialised) before any of the threads which read it are created, and not changed again.

so, for total_point things are safe.

But in general, making a local (to the thread) copy of such variables:

  1. does no harm,

  2. forces you to consider whether something more sophisticated is required -- such as protecting access to shared state with a mutex (or other mechanism),

  3. avoids possible contention for cache-line(s).

It is safest to avoid using global variables in a thread. I prefer to construct a per thread copy of everything a thread needs, and pass that in at pthread_create() time.

Upvotes: 1

Hitokiri
Hitokiri

Reputation: 3699

Single thread:

void circle_point(){
   srand(time(NULL));
   int i;
   for(i=0;i<total_point;i++){
       double x= (double)rand()/(double)RAND_MAX;
       double y=(double)rand()/(double)RAND_MAX;
       double r= sqrt(x*x+y*y);
       if(r<=1) count_circle+=1;
    }
}

Multi-thread:

void* circle_point(void *param){
   int *pcount= (int*)param;
   int i;
   for(i=0; i<total_point;i++){
       double x= (double)rand()/(double)RAND_MAX;
       double y=(double)rand()/(double)RAND_MAX;
       double r= sqrt(x*x+y*y);
       if(r<=1) *pcount=*pcount+1;
     }
   pthread_exit(0);

I guess you use the same total_point for each thread of multi-threads program as in single-thread (it means that in both cases, total_point = 10^7 for each thread in second program). So in the multi thread program, you calculate with NUM_THREAD * total_point points. You can try with NUM_THREAD = 1, the execution times in both case are similar (second program maybe a bit higher). Or you can measure the execution time of each thread to verify that.

If you want to compare the speed of two programs, you have to make the multi-thread program calculate the same total number of point as the single thread. For example, first thread calculates 10^7/4 points, then second thread calculates 10^7/4 points, and so on.

In theory, a multi-threaded program can finish faster than a sequential one, because some of the work it does can proceed simultaneously. The sum of time spent on all processors will be higher than a sequential version (because of the added coordinating stuff), but the elapsed time from start to finish may be shorter.

Upvotes: 1

Related Questions