Yulong Tian
Yulong Tian

Reputation: 107

Parallel threads in Linux

#include <iostream>
#include <time.h>
#include <pthread.h>
using namespace std;

void*genFunc2(void*val)
{
    int i,j,k;
    for(i=0;i<(1<<15);i++)
    {
        clock_t t1=clock();
        for(j=0;j<(1<<20);j++)
        {
            for(k=0;k<(1<<10);k++)
            {

            }
        }
        clock_t t2=clock();
        cout<<"t1:"<<t1<<" t2:"<<t2<<" t2-t1:"<<(t2-t1)/CLOCKS_PER_SEC<<endl;
    }
}

int main()
{
    cout<<"begin"<<endl;
    pthread_t ntid1;pthread_t ntid2;pthread_t ntid3;pthread_t ntid4;
    pthread_create(&ntid1,NULL,genFunc2,NULL);
    pthread_create(&ntid2,NULL,genFunc2,NULL);
    pthread_create(&ntid3,NULL,genFunc2,NULL);
    pthread_create(&ntid4,NULL,genFunc2,NULL);
    pthread_join(ntid1,NULL);pthread_join(ntid2,NULL);
    pthread_join(ntid3,NULL);pthread_join(ntid4,NULL);
    return 0;
}

I show my example above. When I just create one thread, it can print the time in 2 seconds. However, when I create four threads, each thread only prints its result in 15 seconds. Why?

Upvotes: 0

Views: 632

Answers (1)

Soravux
Soravux

Reputation: 9963

This kind of algorithm can easily be parallelized using OpenMP, I suggest you check into it to simplify your code.

That being said, you use the clock() function to compute the execution time of your runs. This doesn't show the wallclock of your execution but the number of clock ticks that your CPU was busy executing your program. This is a bit strange because it may, per example, show 4 seconds while only 1 seconds have passed. This is perfectly logic on a 4 cores machine: if the 4 core were all 100% busy in your threads, you used 4 seconds of computing time (in core⋅seconds units). This is because you divide by the CLOCKS_PER_SEC constant, which is true only for a single core. Each of your core are running at CLOCKS_PER_SEC, effectively explaining most of the discrepancy between your experiments.

Furthermore, two notes to take into account with your code:

  • You should deactivate any kind of optimization (e.g.: -O0 on gcc), otherwise your inner loops may get removed depending on the compiler and other circumstances such as parallelization.
  • If your computer only have two real cores with Hyper-Threading activated (thus showing 4 cores in your OS), it may explain the remaining difference between your runs and my previous explanation.

To solve your problem with high resolution, you should use the function clock_gettime(CLOCK_MONOTONIC, &timer); as explained in this answer.

Upvotes: 2

Related Questions