\nI am using Openmp to fasten the calculation. Single thread execution is giving correct result but as soon I go to multiple threads I get weird result.
\nSample Code is given below. Kindly, help to resolve the issue.
class A (\n void func(float *buf);\n void populateRes(*t); \n private:\n float ***res;\n };\n\n A a[n];\n int nthrd = omp_get_num_threads();\n float *buf;\n while (cnt < nz)\n {\n #pragma omp parallel shared(cnt) private(buf, tid, omp_i)\n {\n if(cnt == 0 )\n buf = new float[x*y*z];\n\n #pragma omp for\n for(omp_i=0; omp_i<n; omp_i++)\n { \n a[omp_i].func(buf);\n a[omp_i].populateRes(buf);\n }\n }\n cnt++;\n if(cnt >= nz)\n delete []buf;\n }\n
\n","author":{"@type":"Person","name":"Abhi"},"upvoteCount":0,"answerCount":1,"acceptedAnswer":null}}Reputation: 31
I have class that contains dynamic 3d array. Object of the class calls a function that does some calculation to populate 1d array and in the end populate object's 3d array with the 1d array data.The size of 1d array is same as of 3d array.
I am using Openmp to fasten the calculation. Single thread execution is giving correct result but as soon I go to multiple threads I get weird result.
Sample Code is given below. Kindly, help to resolve the issue.
class A (
void func(float *buf);
void populateRes(*t);
private:
float ***res;
};
A a[n];
int nthrd = omp_get_num_threads();
float *buf;
while (cnt < nz)
{
#pragma omp parallel shared(cnt) private(buf, tid, omp_i)
{
if(cnt == 0 )
buf = new float[x*y*z];
#pragma omp for
for(omp_i=0; omp_i<n; omp_i++)
{
a[omp_i].func(buf);
a[omp_i].populateRes(buf);
}
}
cnt++;
if(cnt >= nz)
delete []buf;
}
Upvotes: 0
Views: 232
Reputation: 74485
OpenMP does not preserve the values of private
variables between different entries into the same parallel
region, the same way that values of automatic local variables are not preserved between different invocations of a function (unless they are declared static
). In fact, in most OpenMP implementations, parallel
regions are separate functions and private
variables are automatic.
This makes your code erroneous since buf
would only be allocated on the first iteration of the loop, but then in the next iterations your code would operate on a new uninitialised local copy. It might happen (by pure chance) that the content of the stack of a particular thread is not changed and thus buf
retains its value. Also deleting buf
outside of the parallel
region ignores the fact that there were multiple calls to new
.
If you'd like to allocate buf
only once, you should put the while
loop inside the parallel
region and not vice versa. This would also improve the performance since the parallel
region would be entered only once and there is an overhead associated with each entry.
A a[n];
#pragma omp parallel
{
float *buf = new float[x*y*z];
for (int cnt = 0; cnt < nz; cnt++)
{
#pragma omp for
for (int i = 0; i < n; i++)
{
a[i].func(buf);
a[i].populateRes(buf);
}
}
delete [] buf;
}
(I don't see tid
being used inside so I've taken the liberty to remove it from the list of private
variables)
Changing the nesting of both loops is possible since there is an implicit barrier at the end of the for
worksharing constuct. I don't know if there is other code that you've omitted but given the code from the question, the cnt
loop could be even nested inside the worksharing construct, i.e.:
#pragma omp parallel
{
float *buf = new float[x*y*z];
#pragma omp for
for (int i = 0; i < n; i++)
{
for (int cnt = 0; cnt < nz; cnt++)
{
a[i].func(buf);
a[i].populateRes(buf);
}
}
delete [] buf;
}
Upvotes: 1