Reputation: 47
I'am novice in programming and just started using pthread in c language.I was curious about the degree of performance improvement by multi-threading. To test this I wrote a simple program to calculates the sum of n digits(honestly, took it from youtube video). I gave it some real big numbers to get some values of execution time.
#include<stdio.h>
#include<pthread.h>
long long sum=0,pod=1;
void* sum_run(void* arg)
{
long long *var_ptr=(long long *)arg;
long long i,var=*var_ptr;
for(i=0;i<=var;i++)
{
sum+=i;
}
pthread_exit(0);
}
void* sum_run2(void* arg)
{
long long *var_ptr2=(long long *)arg;
long long j,var2=*var_ptr2;
for(j=0;j<=var2;j++)
{
pod+=j;
}
pthread_exit(0);
}
int main(void)
{
printf("wait getting it...\n");
long long val=999999999,val2=899999999;
pthread_t tid[1];
pthread_create(&tid[0],NULL,sum_run,&val);
pthread_create(&tid[1],NULL,sum_run2,&val2);
pthread_join(tid[0],NULL);
pthread_join(tid[1],NULL);
printf("sum1 is %lld sum2 is %lld",sum,pod);
}
O yeah, by mistake I initiated the second long long variable pod to 1 which gave me false result (i.e. 1 more than the desired). So , I corrected my mistake and made pod=0 and here came the PROBLEM after changing it my program's execution time increased to more than twice even larger than the program which does the same task without using pthread. I can't think of what's happening inside. Please help the program.
pod=1 exec.time=~2.8secs
pod=0 exec.time=~11.4secs
when sum=1 pod=1 exec.time bounces to ~25.4secs
Why is it shifting due to changing values?
Also, I found out if one variable is 0 and other's not then their addresses are not continuous.
Using Devcpp's gcc4.9.2 with -pthread switch
Upvotes: 2
Views: 148
Reputation: 182847
You are seeing false sharing caused by sum
and pod
being initialized the same way in close proximity. This causes them to share a cache line.
As each thread tries to modify the cache line, it will find that the other thread has modified it last and the inter-core protocol has to be invoked to transfer ownership of the modified cache line from the other core to this core. The cache line will ping-pong back and forth and the two threads will run at the speed of the inter-core bus -- much worse than the speed of a single thread repeatedly hitting its L1 cache. This phenomenon is called false sharing.
By initializing one and not the other, you caused them to be allocated in different segments. This made the false sharing go away as they are now too far apart to share a cache line.
A common solution to this problem is to put some padding variables between the two. For example, you could put them in a struct with long long spacing[7];
between them.
Upvotes: 5