Reputation: 1561
I am trying to parallelise gaussian elimination with pivoting using OpenMP.
Below is the relevant section of the code that I wrote:
struct timeval tvBegin, tvEnd;
gettimeofday(&tvBegin, NULL);
for (k=1; k<=n-1; ++k) {
amax = (double) fabs(a[k][k]) ;
m = k;
for (i=k+1; i<=n; i++){ /* Find the row with largest pivot */
xfac = (double) fabs(a[i][k]);
if(xfac > amax) {amax = xfac; m=i;}
}
if(m != k) { /* Row interchanges */
rowx = rowx+1;
temp1 = b[k];
b[k] = b[m];
b[m] = temp1;
for(j=k; j<=n; j++) {
temp = a[k][j];
a[k][j] = a[m][j];
a[m][j] = temp;
}
}
#pragma omp parallel for private(i,j)
for (i=k+1; i<=n; ++i) {
xfac = a[i][k]/a[k][k];
for (j=k+1; j<=n; ++j) {
a[i][j] = a[i][j]-xfac*a[k][j];
}
b[i] = b[i]-xfac*b[k];
} matrix_print_off (n, n, a);}
}
gettimeofday(&tvEnd, NULL);
printf("\nTime elapsed in ms: %d\n", diff_ms(tvEnd, tvBegin));
I tested this code with 1000*1000 matrix. The average time taken for running this code (measured via diff_ms) on a 4 core machine is coming out to be the same (2142ms) as the sequential version (without pragmas) of this code. Since there is immense parallelisation happening here, this shouldn't be the case. Could you please let me know where did I go wrong?
For reference, I have also attached the diff_ms function below.
int diff_ms(struct timeval t1, struct timeval t2)
{
return (((t1.tv_sec - t2.tv_sec) * 1000) +
(t1.tv_usec - t2.tv_usec)/1000);
}
Thanks!
Upvotes: 0
Views: 1885
Reputation: 9299
Inside your parallel section, you have matrix_print_off()
. Assuming your print function is thread safe, this will significantly reduce the amount of parallelism you can achieve. Additionally, if matrix_print_off()
uses blocking IO, then this function's time may dominate the rest of your function.
Upvotes: 1