Gene Bushuyev
Gene Bushuyev

Reputation: 5538

Does this use of reinterpret_cast invoke an undefined behavior?

The basic idea is to create a variable size array, fixed at construction time and another class in a single allocation unit in order to reduce overhead and improve efficiency. A buffer is allocated to fit the array and another object and placement new is used to construct them. In order to access the elements of the array and the other object a pointer arithmetic and reinterpret_cast are used. That seems to work (at least in gcc), but my reading of the standard (5.2.10 Reinterpret Cast) tells me it's an undefined behavior. Is that correct? And if so, is there any way to implement this design without UB?

Full compilable example is here: http://ideone.com/C9CCa8

// a buffer contains array of A followed by B, laid out like this
// | A[N - 1] ... A[0] | B |

class A
{
    size_t index;
//...
// using reinterpret_cast to get to B object
    const B* getB() const 
    { 
        return reinterpret_cast<const B*>(this + index + 1); 
    }
};

class B
{
    size_t a_count;
//...
    virtual ~B() {}
// using reinterpret_cast to get to the array member
    const A* getA(size_t i) const 
    { 
        return reinterpret_cast<const A*>(this) - i - 1; 
    }
};

// using placement new to construct all objects in raw memory
B* make_record(size_t a_count)
{
    char* buf = new char[a_count*sizeof(A) + sizeof(B)];
    for(auto i = 0; i < a_count; ++i)
    {
        new(buf) A(a_count - i - 1);
        buf += sizeof(A);
    }
    return new(buf) B(a_count);
}

Upvotes: 2

Views: 666

Answers (4)

Grumpy Coder
Grumpy Coder

Reputation: 11

The sample code you posted does not show problems, because it just happens to have the same alignment requirements for both classes (and uses nice even numbers of objects of class A). I modified your example somewhat to demonstrate what happens if alignof(A) < align of(B) and you use odd numbers of A: http://ideone.com/eC7l17

Now you get this output:

B starts at 0x9003008, needs alignment 4, misaligned by 0
B has 0 As
B starts at 0x900306a, needs alignment 4, misaligned by 2
B has 1 As
A[]
B starts at 0x90030cc, needs alignment 4, misaligned by 0
B has 2 As
A[]
A[]

and interesting things would happen if you tried to use the misaligned pointer to B (recovered from A[0].

Avi Berger already suggested a fix. I'll try to come up with a generalized template for arbitrary A and B that will do the right thing.

| A[N - 1] ... A[0] | <padding> | B |

where the padding is computed based on alignof(A) and alignof(B)

Upvotes: 1

Raffi
Raffi

Reputation: 3386

The problem seems to happen when you have one child object dependent of multiple parents. In your case, using raw pointers such as

const B* A::getB() const 
{ 
  return (B*)(this + index + 1); 
}

or

const B* A::getB() const 
{ 
  return (B*)((void*)this + sizeof(A) * (index + 1)); 
}

should yield exactly the same pointer arithmetic you want to achieve. What I understood from this doc is (example taken from there):

class Base1 {public: virtual ~Base1() {}};
class Base2 {public: virtual ~Base2() {}};
class Derived: public Base1, public Base2 {public: virtual ~Derived() {}};

// ...
Derived obj;
Derived* dp = &obj;
Base1* b1p = dp;
Base2* b2p = dp; // [1]
Derived* dps = static_cast<Derived*>(b2p); // [2]
Derived* dpr = reinterpret_cast<Derived*>(b2p); // [3]

dp is a pointer to the object Derived, which layout is basically something like a concatenation of Base1, Base2 and Derived in that order:

---- address 1: used by Derived and Base1
---- members of Base1: roughly sizeof(Base1))
---- address 2: used by Base2
---- members of Base2: roughly sizeof(Base2))
---- members of Derived 

(though I really think this is completely implementation specific, but it is my understanding of the layout).

If you would like to point to the parent Base2 object within the Derived object, the equal operator (line [1]) casts correctly to the parent Base2 address. The static_cast operator (line [2]) gets back to the original value using the the hierarchy known at compilation time. The reinterpret_cast on the oder hand is like a C style cast, and since it operates on a pointer to Base2, returns an erroneous pointer to a Derived object in dpr.

Coming back to your initial question, I do not think you may have any issue as long as their are no dependencies between your two classes in terms of hierarchy. Using casts such as void * and explicit pointer arithmetic (sizeof(A)) seems however to me more appropriate.

I am curious to know in what extent it will improve the performances in fact against having the array of As and a pointer to the unique B.

Upvotes: 0

James Kanze
James Kanze

Reputation: 153929

It's an interesting question. The question is what does this + index + 1 point to. If it really is a B, there should be no problem (assuming that an A* is sufficiently large to contain a B* without loss of value): "Converting a prvalue of type 'pointer to T1' to the type 'pointer to T2' (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value." (§5.2.10/7) Since you've used the same expression (basically) to obtain the address at which you construct the B, the only thing you can legally do with this + index + 1 is to convert it back to a B*.

But since you need the index variable in each element anyway, why not save it as a pointer, rather than an index.

And in the end: this is a horrible solution with regards to code readability, and robustness. In particular, if B has stricter alignment requirements than A, you can easily end up with the B misaligned. And if you change anything down the road, B might end up with stricter alignment requirements. I'd avoid this solution at all costs.

Upvotes: 2

Neil Kirk
Neil Kirk

Reputation: 21783

When using placement new, it's up to you to ensure the target memory is properly aligned for your data type, otherwise it is undefined behavior. After an array of A's, it is not guaranteed that the alignment of buf will be correct for an object of type B. Your use of reinterpret_cast is also undefined behavior.

Undefined behavior doesn't mean it won't work. It may for a particular compiler, and a particular set of class types and pointer offsets, etc. But you cannot put this code in an arbitrary standard-conformant compiler and guarantee it will work.

Use of these hacks strongly suggests you have not designed your solution properly.

Upvotes: 5

Related Questions