Reputation: 107
#include <iostream>
#include <time.h>
#include <pthread.h>
using namespace std;
void*genFunc2(void*val)
{
int i,j,k;
for(i=0;i<(1<<15);i++)
{
clock_t t1=clock();
for(j=0;j<(1<<20);j++)
{
for(k=0;k<(1<<10);k++)
{
}
}
clock_t t2=clock();
cout<<"t1:"<<t1<<" t2:"<<t2<<" t2-t1:"<<(t2-t1)/CLOCKS_PER_SEC<<endl;
}
}
int main()
{
cout<<"begin"<<endl;
pthread_t ntid1;pthread_t ntid2;pthread_t ntid3;pthread_t ntid4;
pthread_create(&ntid1,NULL,genFunc2,NULL);
pthread_create(&ntid2,NULL,genFunc2,NULL);
pthread_create(&ntid3,NULL,genFunc2,NULL);
pthread_create(&ntid4,NULL,genFunc2,NULL);
pthread_join(ntid1,NULL);pthread_join(ntid2,NULL);
pthread_join(ntid3,NULL);pthread_join(ntid4,NULL);
return 0;
}
I show my example above. When I just create one thread, it can print the time in 2 seconds. However, when I create four threads, each thread only prints its result in 15 seconds. Why?
Upvotes: 0
Views: 632
Reputation: 9963
This kind of algorithm can easily be parallelized using OpenMP, I suggest you check into it to simplify your code.
That being said, you use the clock()
function to compute the execution time of your runs. This doesn't show the wallclock of your execution but the number of clock ticks that your CPU was busy executing your program. This is a bit strange because it may, per example, show 4 seconds while only 1 seconds have passed. This is perfectly logic on a 4 cores machine: if the 4 core were all 100% busy in your threads, you used 4 seconds of computing time (in core⋅seconds units). This is because you divide by the CLOCKS_PER_SEC
constant, which is true only for a single core. Each of your core are running at CLOCKS_PER_SEC
, effectively explaining most of the discrepancy between your experiments.
Furthermore, two notes to take into account with your code:
To solve your problem with high resolution, you should use the function clock_gettime(CLOCK_MONOTONIC, &timer);
as explained in this answer.
Upvotes: 2