Xeo
Xeo

Reputation: 131887

Does the 'offsetof' macro from <stddef.h> invoke undefined behaviour?

Example from MSVC's implementation:

#define offsetof(s,m) \
    (size_t)&reinterpret_cast<const volatile char&>((((s *)0)->m))
//                                                   ^^^^^^^^^^^

As can be seen, it dereferences a null pointer, which normally invokes undefined behaviour. Is this an exception to the rule or what is going on?

Upvotes: 18

Views: 3977

Answers (6)

personal_cloud
personal_cloud

Reputation: 4524

It is NOT undefined behavior in C++ if m is at offset 0 within the structure s, as well as in certain other cases. According to Issue 232 (emphasis mine):

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points, if any. If the pointer is a null pointer value (7.11 [conv.ptr]) or points one past the last element of an array object (8.7 [expr.add]), the result is an empty lvalue and does not refer to any object or function. An empty lvalue is not modifiable.

Therefore, the &((s *)0)->m is undefined behavior only if m is neither at offset 0, nor at an offset corresponding to an address which is one past the last element of an array object. Note that adding a 0 offset to null is allowed in C++ but not in C.

As others have noted, the compiler is allowed (and extremely likely) to not ever create the undefined behavior, and may be packaged with libraries that make use of the specific compiler's enhanced specifications.

Upvotes: 0

supercat
supercat

Reputation: 81307

When the C Standard specifies that certain actions invoke Undefined Behavior, that does has not generally meant that such actions were forbidden, but rather that implementations were free to specify the consequent behaviors or not as they see fit. Consequently, implementations would be free to perform such actions in cases where the Standard requires defined behavior, if and only if the implementations can guarantee that the behaviors for those actions will be consistent with what the Standard requires. Consider, for example, the following implementation of strcpy:

char *strcpy(char *dest, char const *src)
{
  ptrdiff_t diff = dest-src-1;
  int ch;
  while((ch = *src++) != 0)
    src[diff] = ch;
  return dest;
}

If src and dest are unrelated pointers, the computation of dest-src would yield Undefined Behavior. On some platforms, however, the relation between char* and ptrdiff_t is such that given any char* p1, p2, the computation p1 + (p2-p1); will always equal p2. On platforms which make that guarantee, the above implementation of strcpy would be legitimate (and on some such platforms might be faster than any plausible alternative). On some other platforms, however, such a function might always fail except when both strings are part of the same allocated object.

The same principle applies to the offsetof macro. There is no requirement that compilers offer any way to get behavior equivalent to offsetof (other than by actually using that macro) If a compiler's model for pointer arithmetic makes it possible to get the required offsetof behavior by using the -> operator on a null pointer, then its offsetof macro can do that. If a compiler wouldn't support any efforts to use -> on something other than a legitimate pointer to an instance of the type, then it may need to define an intrinsic which can compute a field offset and define the offsetof macro to use that. What's important is not that the Standard define the behaviors of actions performed using standard-library macros and functions, but rather than the implementation ensures that behaviors of such macros and functions match requirements.

Upvotes: 4

Ben Voigt
Ben Voigt

Reputation: 283863

This is basically equivalent to asking whether this is UB:

s* p = 0;
volatile auto& r = p->m;

Clearly no memory access is generated to the target of r, because it's volatile and the compiler is prohibited from generating spurious accesses to volatile variables. But *s is not volatile, so the compiler could possibly generate an access to it. Neither the address-of operator nor casting to reference type creates an unevaluated context according to the standard.

So, I don't see any reason for the volatile, and I agree with the others that this is undefined behavior according to the standard. Of course, any compiler is permitted to define behavior where the standard leaves it implementation-specified or undefined.

Finally, a note in section [dcl.ref] says

in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.

Upvotes: 1

AnT stands with Russia
AnT stands with Russia

Reputation: 320739

The notion of "undefined behavior" is not applicable to the implementation of the Standard Library, regardless of whether it is a macro, a function or anything else.

In general case, the Standard Library should not be seen as implemented in C++ (or C) language. That applies to standard header files as well. The Standard Library should conform to its external specification, but everything else is an implementation detail, exempt from all and any other requirements of the language. The Standard Library should be always thought of as implemented in some "internal" language, which might closely resemble C++ or C, but still is not C++ or C.

In other words, the macro you quoted does not produce undefined behavior, as long as it is specifically the offsetof macro defined in the Standard Library. But if you do exactly the same thing in your code (like define your own macro in the very same way), it will indeed result in undefined behavior. "Quod licet Jovi, non licet bovi".

Upvotes: 17

Cheers and hth. - Alf
Cheers and hth. - Alf

Reputation: 145429

Where the language standard says "undefined behavior", any given compiler can define the behavior. Implementation code in the standard library typically relies on that. So there are two questions:

(1) Is the code UB with respect to the C++ standard?

That's a really hard question, because it's a well known almost-defect that the C++98/03 standard never says right out in normative text that in general it's UB to dereference a nullpointer. It is implied by the exception for typeid, where it's not UB.

What you can say decidedly is that it's UB to use offsetof with a non-POD type.

(2) Is the code UB with respect to the compiler that it's written for?

No, of course not.

A compiler vendor's code for a given compiler can use any feature of that compiler.

Cheers & hth.,

Upvotes: 26

Richard Schneider
Richard Schneider

Reputation: 35464

No, this is NOT undefined behaviour. The expression is resolved at runtime.

Note that it is taking the address of the member m from a null pointer. It is NOT dereferencing the null pointer.

Upvotes: -4

Related Questions