Reputation: 11
The following snippet from a C++ function was originally written as serial code. In order to parallelize the outer loop with counter 'jC', I just added the line "#pragma omp parallel for private(jC)" . Although this naive approach has helped me many times, I doubt whether it suffices to parallelize the jC-loop, because the execution time seems to be unchanged with respect to the original code. Has anybody some suggestions to ensure the following code is effectively transformed into a (correct) parallel code?
Thanks in advance and my apologies if my question is not well posed (it is my first post at this forum).
The code snippet is:
#include "omp.h"
void addRHS_csource_to_pcellroutine_par(
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{ int elamax = elsize*elsize;
int jC;
#pragma omp parallel for private(jC)
for (int jC=0; jC<nPc; jC++) {
for (int el=0; el<elamax; el++) {
adata[el + jC*elamax] = adata[el + jC*elamax] - srcCoeff[el + jC*elamax];
}
for (int el=0; el<elsize; el++) {
bdata[el + jC*elsize] = bdata[el + jC*elsize] + srcVal[el + jC*elsize];
}
}
}
Additional note: One (probably not the most elegant?) way to work around it, consists of changing to code into
void addRHS_csource_to_pcellroutine_parFunction(int jC, int elamax,
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{
for (int el=0; el<elamax; el++) {
adata[el + jC*elamax] -= srcCoeff[el + jC*elamax];
}
for (int el=0; el<elsize; el++) {
bdata[el + jC*elsize] += srcVal[el + jC*elsize];
}
}
void addRHS_csource_to_pcellroutine_par(
double *srcCoeff, double *srcVal, int nPc,
double *adata, double *bdata, int elsize
)
{ int elamax = elsize*elsize;
#pragma omp parallel for
for (int jC=0; jC<nPc; jC++) {
addRHS_csource_to_pcellroutine_parFunction(jC, elamax, srcCoeff, srcVal, nPc, adata, bdata, elsize);
}
}
Upvotes: 1
Views: 1648
Reputation: 1220
As you can read in specifcation (on page 55) your inner loops are not parallelized. Only the outer one is.
int jC;
#pragma omp parallel for private(jC)
for (int jC=0;......
you have defined two variables named jC. What you intended to do is correct but you should decide for one solution:
int jC;
#pragma omp parallel for private(jC)
for(jC = 0;....
or
#pragma omp parallel for
for(int jC = 0;....
As for:
I doubt whether it suffices to parallelize the jC-loop, because the execution time seems to be unchanged with respect to the original code.
the sufficiency depends on the number of iterations you have to do (given by nPc) and how many threads you provide (reasonably on a quad-core 8 Threads). You can even get slower parallelizing loop. This is because the Overhead to create the new Threads is pretty high ( + some other additional stuff like manging the threads).
So you have to gain more time by parallelizing the loop than you need to create the Threads.
Hope this answers your questions.
If you still nedd a faster Programm you can think about an algorithm to parallelize the inner loops aswell (eg by splitting the iteration space and using openmp reduction construct)
Upvotes: 1