VainMan
VainMan

Reputation: 2380

Class layout: Do all objects created from the same most derived polymorphic class type share a unique memory layout?

From https://en.cppreference.com/w/cpp/language/data_members#Layout:

Layout: When an object of some class C is created, each non-static data member of non-reference type is allocated in some part of the object representation of C.

Q1: Is it also true for object of (possibly virtual) base class and non-static data member of (possibly virtual) base class?

Q2: If the answer of Q1 is true, is that layout identical from each object of the most derived class? For example, do all objects created from class C share a unique layout (eg. data member offsets and vtable for virtual bases), even among multiple compilation units?

Q3: If the answer of Q2 is true, is it safe to access data members by adding offset to address of object of the most derived class ? (I guess it is, becase casting a non-function pointer to char* then casting back would be safe.)

For more detail and more generally, is the following piece of code guaranteed running safely? (Note: As this code is a bit long, you don't have to read it unless you need more detail about Qs. You can also leave a comment and let me edit better.)

Thanks.

// test_layout.cpp

#include <cassert>
#include <cstdint>
#include <utility>
#include <iostream>
#include <vector>

template<class T>
char* cast(T* p) {
  return reinterpret_cast<char*>(p);
}

void on_constructing_X(char*);

class X {
public:
  X() {
    x_ = ++counter_;
    // do init
    on_constructing_X(cast(this));
  }

  void do_something() {
    // bala bala
    std::cout << "X@" << this << " " << x_ << std::endl;
  }

private:
  int x_;
  // bala bala

  static int counter_;
};
int X::counter_ = 0;

struct Info {
  char* begin;
  char* end;
  std::vector<long>* offsets;
  bool init;
};

static std::vector<Info> constructing_wrapper;

template<typename T>
class Wrapper {
private:
  union {
    T data_;
    char dummy_;
  };
  static bool init_;
  static std::vector<long> offsets_;

private:
  template<typename...Args>
  Wrapper(Args&&...args): dummy_(0) {
    std::cout << "constructing Wrapper at " << this << std::endl;
    constructing_wrapper.push_back({cast(this), cast(this) + sizeof(*this), &offsets_, init_});
    new (&data_) T(std::forward<Args>(args)...);
    constructing_wrapper.pop_back();
    init_ = true;
  }

public:
  ~Wrapper() {
    data_.~T();
  }

  template<typename...Args>
  static Wrapper* Make(Args&&...args) {
    return new Wrapper(std::forward<Args>(*args)...);
  }

  template<typename F>
  void traversal_X(F&& f) {
    for (auto off: offsets_) {
      f(reinterpret_cast<X*>(cast(this) + off));
    }
  }

};

template<typename T>
bool Wrapper<T>::init_ = false;

template<typename T>
std::vector<long> Wrapper<T>::offsets_;

void on_constructing_X(char* x) {
  if (!constructing_wrapper.empty()) {
    auto i = constructing_wrapper.back();
    if (i.begin <= x && x + sizeof(X) <= i.end) {
      if (!i.init) {
        i.offsets->push_back(x - i.begin);
      } else {
        bool found = false;
        for (auto off: *i.offsets) {
          if (x - i.begin == off) {
            found = true;
            break;
          }
        }
        if (!found) {
          std::cout << "Error" << std::endl;
          std::abort();
        }
      }
    }
  }
}

namespace test {
  class B { X xb; };
  class D1: B { X xd1; };
  class D2: protected virtual B { X xd2; };
  class D3: protected virtual B { X xd3; };
  class DD: D1, D2, D3 { X xdd; };

  void test() {
    for (int i = 0; i < 2; ++i) {
      auto p = Wrapper<D2>::Make();
      p->traversal_X([](X* x) {x->do_something();});
      delete p;
    }
    for (int i = 0; i < 2; ++i) {
      auto p = Wrapper<DD>::Make();
      p->traversal_X([](X* x) {x->do_something();});
      delete p;
    }
  }
}

int main() {
  test::test();
  return 0;
}

Upvotes: 0

Views: 133

Answers (1)

Brian Bi
Brian Bi

Reputation: 119382

By definition, the object representation contains all the data stored in the object, including all non-static data members and base classes, whether virtual or not. The definition of object representation in C++17 is:

the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).

Clearly, any bytes that are used to store subobjects are "taken up" by the object, since the subobject is part of the object. That means those bytes are part of the object representation. This should hopefully answer your Q1.

For Q2, I don't think that the standard guarantees that all complete objects of the same type will have the same layout, unless the type is standard-layout. In practice I think it would be unusual to find an implementation where complete objects of the same type could have different layouts. If every translation unit sees the same definition of the class (which it should, otherwise you are violating then ODR) then the compiler should have no problem with generating the same layout in each translation unit, and this just seems like the sensible thing to do (otherwise you may have to generate multiple vtables). But, if for some reason an implementation did want to vary the layout, I think that it could do so, even within a single translation unit.

But in addition I would question whether pointer arithmetic would be allowed, even if the layout were guaranteed! If T is neither a standard-layout nor trivially copyable type, then it's not clear to me whether you're even allowed to do pointer arithmetic within an object of type T, using a char* that points to one of the bytes of T. Consider for example that offsetof is only guaranteed to be supported for standard-layout types, and memcpying an object into a byte array and back is only guaranteed to be well-defined for trivially copyable types. Let's say a type has a virtual base class, making it neither standard-layout nor trivially copyable. I am not sure whether this code is well-defined:

struct B {};
struct D : virtual B {};
auto foo() {
    D d;
    B* pb = &d;
    return reinterpret_cast<char*>(pb) - reinterpret_cast<char*>(&d);
}

If it's not even well-defined to call foo at all, then the question of whether it always returns the same value is obviously moot. Perhaps offsets within these types are just not observable (according to the rules of the abstract machine).

Upvotes: 1

Related Questions