Nikolai
Nikolai

Reputation: 1549

OpenACC vs C++: FATAL ERROR: variable is partially present on the device

I'm trying to port some C++ application to GPU using OpenACC. As one could expect, the C++ code has a lot of encapsulation and abstraction. Memory is allocated in some vector-like class, then this class gets reused in many other classes around the application. And I'm having troubles trying to correctly insert OpenACC pragmas into the code. Here is a simplified example of code I'm working on:

#define DATASIZE 16

class Data {
  float *arr;
public:
  Data() {arr = new float[DATASIZE];}
  ~Data() { delete [] arr; }
  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;
public:
  void init() {
    for (int i = 0; i < DATASIZE; ++i)
      a.get(i) = 0.0;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

I insert some OpenACC pragmas to send the necessary data to the device and end up with code like this:

#define DATASIZE 16

class Data {
  float *arr;

public:
  Data() {
    arr = new float[DATASIZE];
#pragma acc enter data copyin(this)
#pragma acc enter data create(arr[:DATASIZE])
  }

  ~Data() {
#pragma acc exit data delete(arr)
#pragma acc exit data delete(this)
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
  }

  void init() {
#pragma acc parallel loop
    for (int i = 0; i < DATASIZE; ++i) {
      a.get(i) = 0.0;
    }
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

But after compiling and running it I get the following error:

$ pgc++ test.cc -acc -g

$ ./a.out 
_T24395416_101 lives at 0x7fff49e03070 size 24 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 3.5, threadid=1
host:0x1ae6eb0 device:0xc05ca0200 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33620 device:0xc05ca0600 size:64 presentcount:0+1 line:11 name:(null)
host:0x1f33d10 device:0xc05ca0a00 size:64 presentcount:0+1 line:11 name:(null)
host:0x7fff49e03070 device:0xc05ca0000 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03078 device:0xc05ca0400 size:8 presentcount:0+1 line:11 name:_T24395600_98
host:0x7fff49e03080 device:0xc05ca0800 size:8 presentcount:0+1 line:11 name:_T24395600_98
allocated block device:0xc05ca0000 size:512 thread:1
allocated block device:0xc05ca0200 size:512 thread:1
allocated block device:0xc05ca0400 size:512 thread:1
allocated block device:0xc05ca0600 size:512 thread:1
allocated block device:0xc05ca0800 size:512 thread:1
allocated block device:0xc05ca0a00 size:512 thread:1

FATAL ERROR: variable in data clause is partially present on the device: name=_T24395416_101
 file:/home/bozhenovn/tst/test.cc _ZN10DataKeeperC1Ev line:27

I have no idea what is wrong with the code. I would appreciate any ideas on how I can fix the code or suggestions how I can further investigate the problem. Thank you!

Upvotes: 0

Views: 1474

Answers (1)

Mat Colgrove
Mat Colgrove

Reputation: 5646

What's going on here is that the host address of "a" is the same as the starting address of "DK". Hence when the compiler looks up the host address in the present table, which it uses to map the variable's host address to the device address, it sees that the size is different. "a" is size 8 while "DK" is size 24.

I'll show the fix below, but let's back-track and understand what's happening here. When "DK" is created on the host, it first creates storage for each of it's data members and then invokes each data member's class constructor. It then executes it's own constructor. Hence for each data member, your code will create the class this pointer on the device and then allocated the "arr" array on the device. After this is done, then "DK" is created on the device with space for each data member. However, since the device copy of "DK" is created after the data members, the compiler can't automatically associated the two.

Below, I've posted two possible fixes.

First, you can have the "Data" class manage it's own data but you'll need to dynamically allocate the class data members. This way the Data constructor will occur after the DataKeeper constructor so the compiler can associate the device data (also called "attach").

Second you can have the DataKeeper class manage the Data class' data. However, this will requite Data's data to be public.

Note that I wrote Chapter 5 "Advanced Data Management" of the book "Parallel Programming with OpenACC" and included a section about C++ class data management. You can find my example code at: https://github.com/rmfarber/ParallelProgrammingWithOpenACC/tree/master/Chapter05 In particular, look over how I did the generic container class, "accList".

Fix #1:

#define DATASIZE 16
#include <iostream>
#ifdef _OPENACC
#include <openacc.h>
#endif

class Data {
  float *arr;

public:
  Data() {
    arr = new float[DATASIZE];
#pragma acc enter data copyin(this)
#pragma acc enter data create(arr[:DATASIZE])
  }

  ~Data() {
#pragma acc exit data delete(arr)
#pragma acc exit data delete(this)
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
  void updatehost() {
   #pragma acc update host(arr[0:DATASIZE])
  }

};

class DataKeeper {
  Data *a, *b, *c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
  a = new Data;
  b = new Data;
  c = new Data;
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
  delete a;
  delete b;
  delete c;
  }

  void init() {
#pragma acc parallel loop present(a,b,c)
    for (int i = 0; i < DATASIZE; ++i) {
      a->get(i) = i;
    }
    a->updatehost();
    std::cout << "a.arr[0]=" << a->get(0) << std::endl;
    std::cout << "a.arr[end]=" << a->get(DATASIZE-1) << std::endl;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

Fix #2

#define DATASIZE 16
#include <iostream>
#ifdef _OPENACC
#include <openacc.h>
#endif

class Data {
public:
  float *arr;

  Data() {
    arr = new float[DATASIZE];
  }

  ~Data() {
    delete [] arr;
  }

  float &get(int i) { return arr[i]; }
};

class DataKeeper {
  Data a, b, c;

public:
  DataKeeper() {
#pragma acc enter data copyin(this)
#pragma acc enter data create(a.arr[0:DATASIZE])
#pragma acc enter data create(b.arr[0:DATASIZE])
#pragma acc enter data create(c.arr[0:DATASIZE])
  }

  ~DataKeeper() {
#pragma acc exit data delete(this)
#pragma acc exit data delete(a.arr)
#pragma acc exit data delete(b.arr)
#pragma acc exit data delete(c.arr)
  }

  void init() {
#pragma acc parallel loop present(a,b,c)
    for (int i = 0; i < DATASIZE; ++i) {
      a.get(i) = i;
    }
#pragma acc update host(a.arr[0:DATASIZE])
    std::cout << "a.arr[0]=" << a.arr[0] << std::endl;
    std::cout << "a.arr[end]=" << a.arr[DATASIZE-1] << std::endl;
  }
};

int main() {
  DataKeeper DK;
  DK.init();
}

Upvotes: 3

Related Questions