Reputation: 1838
I have the following code
int main(int argc, char** argv )
{
std::vector< std::vector< std::vector<double> > > vec {
{{1,2},{3,4}, {5,6},{7,8}},
{{9,10}, {11,12}},
{{13,14}, {15,16}, {17,18}} };
#pragma acc parallel loop
for (int k = 0; k <3; k++) {
std::vector<std::vector<double>>& vec2d = vec[k];
int L = vec2d.size();
//std::vector<int>dVec{67,51,1,0,50};
std::vector<double>dVec(L, 0.0);
for (int i = 0; i < L; i++)
{
dVec[i] = vec2d[i][1] - vec2d[i][0];
}
for (int j=0; j<2; j++) {
printf("k: %d j: %d vec0: %f, vec1: %f\n", k, j, vec2d[j][0], vec2d[j][1]);
}
}
std::cout<<"finished\n";
return 0;
}
and I compile with pgc++ -fast -ta=tesla:cuda9.2,managed -o runEx runEx.cpp -std=c++17 && ./runEx
if I comment out the #pragma acc parallel loop
, then it works. But if I leave it there, then I get the error
PGCC-S-0155-Procedures called in a compute region must have acc routine information: operator delete (void *) (runEx.cpp: 425)
PGCC-S-0155-Accelerator region ignored; see -Minfo messages (runEx.cpp: 6)
PGCC/x86-64 Linux 19.10-0: compilation completed with severe errors
Also, if I comment out the std::vector<int>dVec
and the for loop containing it, then the code works even with the #pragma acc parallel loop
However, if I change the loop so it becomes just:
#pragma acc parallel loop
for (int k = 0; k <3; k++) {
std::vector<int>dVec{67,51,1,0,50};
}
then I get the same error
why is this?
Upvotes: 0
Views: 319
Reputation: 1279
The problem here is that std::vector has some member functions that are not available on the device. The compiler error is specifically calling out delete
, but I suspected that the size
function will also be problematic, as well as the constructor. Since you don't control the source of std::vector
, it's not possible for you to add an acc routine
to them. The workaround I've done in the past is to strictly use vector outside of OpenACC regions, and pass a raw pointer to its data into the regions. It's a hassle for sure, especially for large codes, but it works. Otherwise, you might also try implementing your own, minimal vector class that does decorate the member functions with acc routine
. I've seen this done successfully as well.
For a very large L, you might be able to get by putting your acc parallel loop
solely on your i
loop, but you'll be copying data back and forth a lot unless you hoist your arrays outside of the k
loop to enable reuse.
Upvotes: 2
Reputation: 626
From the site: https://docs.computecanada.ca/wiki/OpenACC_Tutorial_-_Optimizing_loops
I can see too things:
#pragma acc parallel loop
Is missing parameters.
Upvotes: -1