Reputation: 163
I am trying to learn multi-threaded programming using openmp.
To begin with, I was testing out a nested loop with a large number of array access operations, and then parallelizing it. I am attaching the code below. Basically, I have this fairly large array tmp in the interior loop, and if I make it shared so that every thread can access and change it, my code actually slows down with increasing number of threads. I have written it so that every thread writes the exact same values to array tmp. When I make tmp private, I get speed up proportional to the number of threads. The no. of operations seem to me to be exactly the same in both cases. Why is it slowing down when tmp is shared ? Is it because different threads try to access the same address at the same time ?
int main(){
int k,m,n,dummy_cntr=5000,nthread=10,id;
long num=10000000;
double x[num],tmp[dummy_cntr];
double tm,fact;
clock_t st,fn;
st=clock();
omp_set_num_threads(nthread);
#pragma omp parallel private(tmp)
{
id = omp_get_thread_num();
printf("Thread no. %d \n",id);
#pragma omp for
for (k=0; k<num; k++){
x[k]=k+1;
for (m=0; m<dummy_cntr; m++){
tmp[m] = m;
}
}
}
fn=clock();
tm=(fn-st)/CLOCKS_PER_SEC;
}
P.S.: I am aware that using clock() here doesn't really give the correct time. I have to divide it by the no. of threads in this case to get a similar output as given by "time ./a.out".
Upvotes: 1
Views: 339
Reputation:
Your code has race conditions in tmp
and m
. I don't know what you are really trying to do but this link might be helpful Fill histograms (array reduction) in parallel with OpenMP without using a critical section
I tried cleaning up your code. This code allocates memory fortmp
for each thread which solves your problem with false sharing in tmp
.
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int main() {
int k,m,dummy_cntr=5000;
long num=10000000;
double *x, *tmp;
double dtime;
x = (double*)malloc(sizeof(double)*num);
dtime = omp_get_wtime();
#pragma omp parallel private(tmp, k)
{
tmp = (double*)malloc(sizeof(double)*dummy_cntr);
#pragma omp for
for (k=0; k<num; k++){
x[k]=k+1;
for (m=0; m<dummy_cntr; m++){
tmp[m] = m;
}
}
free(tmp);
}
dtime = omp_get_wtime() - dtime;
printf("%f\n", dtime);
free(x);
return 0;
}
Compiled with
gcc -fopenmp -O3 -std=c89 -Wall -pedantic foo.c
Upvotes: 1
Reputation: 13396
This may be due to cache contention: if a part of the array is accessed by two threads or more it will be cached multiple times, one copy for each core: when one core needs to access it, if the data have been changed, it will need to fetch the latest version from another core cache which takes some time.
Upvotes: 5