Reputation: 3
Please forgive me if this is a stupid question but I failed to find any question similar.
I want to assign the value to a 3d dynamic array with OpenMP parallel on the first loop (in C++).
int i, j, k;
int ***data;
const int NEL = 100;
const int NINT = 2;
data = new int**[NEL];
for (i = 0; i < NEL; i++) {
data[i] = new int*[NINT*NINT*NINT];
for (j = 0; j < NINT*NINT*NINT; j++) {
data[i][j] = new int[NINT*NINT*NINT];
}
}
#pragma omp parallel for
for (i = 0; i < NEL; i++) {
for (j = 0; j < NINT*NINT*NINT; j++) {
for (k = 0; k < NINT*NINT*NINT; k++) {
data[i][j][k] = 1;
}
}
}
I only want to make the outermost loop (i) execute parallel with the nested loops (j and k) execute sequentially. But the compiler throws the access violation error every time.
If I change the dynamic array to a local array, it would work with no problem.
int i, j, k;
const int NINT = 2;
const int NEL = 100;
int data[NEL][NINT*NINT*NINT][NINT*NINT*NINT];
#pragma omp parallel for
for (i = 0; i < NEL; i++) {
for (j = 0; j < NINT*NINT*NINT; j++) {
for (k = 0; k < NINT*NINT*NINT; k++) {
data[i][j][k] = 123;
}
}
}
I'm using Visual Studio 2015 with OpenMP function enabled. Is it because that the OpenMP version in VS 2015 is only 2.0? Or I'm not using the dynamic array with OpenMP correctly?
Upvotes: 0
Views: 1024
Reputation: 793
You need to declare the loop variables within the parallel region, best do
#pragma omp parallel for
for (int i = 0; i < NEL; i++) {
for (int j = 0; j < NINT*NINT*NINT; j++) {
for (int k = 0; k < NINT*NINT*NINT; k++) {
data[i][j][k] = 1;
}
}
}
else, the loop variables will be shared by default, potentially leading to out-of-bounds access to data
In general, it is preferable here to use a std::vector
:
std::vector<std::vector<std::vector<int > > > data;
And if you opt for performance, you want to use contigous memory
std::vector<int> data;
and then access data
by building the index on the fly
data[k+pow(NINT,3)*j+pow(NINT,6)*i] = 1;
Best use a small indexing function here, to make access to data easier
int dataIndex(int i, int j, int k, int NINT){return k+pow(NINT,3)*j+pow(NINT,6)*i;}
and then access data
as
data[dataIndex(i,j,k,NINT)]=1;
The compiler will inline the function most likely, such that there will be no extra cost for the function call.
Upvotes: 2
Reputation: 19706
Try changing the pramga into;
#pragma omp parallel for shared(data) private(i,j,k)
However, to expand on my comment, you are parallelizing over a large number of threads, but giving each of them a tiny amount of work, which means the overhead in scheduling them would be very large compared to the actual benefit.
On top of that, 2*2*2*sizeof(int) may be smaller than a cacheline on most systems, meaning that 2 threads would likely try to write to the same line simultaneously, causing false conflicts and ping-ponging the line across caches.
Upvotes: 1