Phil Miller
Phil Miller

Reputation: 38118

Constructing an object with variable amounts of data tacked on

I'm working on a message-passing runtime system that has existing message allocation code that looks like this when a messages contain variable-length data:

struct MsgBase {
  void* operator new(size_t obj_size, int arr1_size, int arr2_size);
};
struct Msg : MsgBase {
  double *arr1;
  double *arr2;
};
struct MsgBase {
  void* operator new(size_t obj_size, int arr1_size, int arr2_size) {
    size_t offsets[2];
    offsets[0] = obj_size;
    offsets[1] = obj_size + sizeof(double)*arr1_size;
    Msg *m = (Msg *)malloc(offsets[1] + sizeof(double)*arr2_size);
    m->arr1 = (char*)m + sizeof(double)*arr1_size;
    m->arr1 = (char*)m->arr1 + sizeof(double)*arr2_size;
    return m;
  }
};

For hardware-interface reasons, the message has to be allocated as one big buffer, and copying into such a buffer after the fact would kill performance1.

And (lots of) client code that does

Msg *msg = new (12, 17) Msg;
msg->arr1[6] = 543.43;

The problem we've just encountered is that g++ 4.4 (unlike earlier versions) is zeroing out sizeof(Msg) bytes starting at msg before the pointer is returned to us, so those offset pointers into the buffer are not preserved. Thus, the second line of code results in a segfault.

The zeroing doesn't happen if we declare a constructor Msg::Msg(), but my intuition is that the compiler would be within its rights to zero out the allocation before calling the constructor anyway.


Notes: the MsgBase class and attendant operator new() are generated from a client-provided interface definition that says something like

message Msg {
  double arr1[];
  double arr2[];
};

While the client code is responsible for defining Msg itself and ensuring that it inherits from MsgBase. So, we can change anything about MsgBase, but pretty much nothing about Msg without forcing that on existing application code.

  1. Don't tell me to go profile that. We've benchmarked this heavily (we run on several of the top 10 from the top500), and are trying to be zero-copy wherever we can. We currently make no copies here, and regressions would suck.

Upvotes: 0

Views: 97

Answers (1)

Ben Voigt
Ben Voigt

Reputation: 283674

You're right that anything written into the memory by operator new is not guaranteed to be preserved. So no, you can't retain the current broken call syntax.

Use a factory function and the library-provided placement new. Something like this:

struct Msg
{
  double* const arr1;
  double* const arr2;
private:
  Msg(double* p1, double* p2) : arr1(p1), arr2(p2) {}
  Msg(const Msg&); // deleted copy-constructor
  // having const members prevents assignment operator from being implicitly generated
public:
  static Msg* Create( size_t arr1_len, size_t arr2_len )
  {
      void* raw = ::operator new(sizeof (Msg) + (1 + arr1_len + arr2_len) * sizeof (double));
      // note ugly math to properly align double, assumes sizeof (double) is a power of 2
      // consider using alignof (double) instead of sizeof (double) if your compiler supports it
      double* p = reinterpret_cast<double*>((reinterpret_cast<intptr_t>(raw) + sizeof (Msg) + sizeof (double)) & ~(sizeof (double) - 1));
      return new (raw) Msg(p, p + arr1_len);
  }
};

NOTE: You could possibly retain the current syntax by using thread-local variables and a constructor... basically your custom operator new would put either the pointers or the sizes into the TLS, then the constructor would read the TLS and set the pointers appropriately. I think I would pass the sizes, since the compiler might be asking operator new for a little extra memory and stuffing debugging information in front of the actual object.

Upvotes: 1

Related Questions