Elias Van Ootegem
Elias Van Ootegem

Reputation: 76395

Does this avoid UB

This question is more of an academic one, seeing as there is no valid reason to write your own offsetof macro anymore. Nevertheless, I've seen this home-grown implementation pop-up here and there:

#define offsetof(s, m) ((size_t) &(((s *)0)->m))

Which is, technically speaking, dereferencing a NULL pointer (AFAIKT):

C11(ISO/IEC 9899:201x) §6.3.2.3 Pointers Section 3

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant

So the above implementation is, according to how I read the standard, the same as writing:

#define offsetof(s, m) ((size_t) &(((s *)NULL)->m))

It does make me wonder that, by changing one tiny detail, the following definition of offsetof would be completely legal, and reliable:

#define offsetof(s, m) (((size_t)&(((s *) 1)->m)) - 1)

Seeing as, instead of 0, 1 is used as a pointer, and I subtract 1 at the end, the result should be the same. I'm no longer using a NULL pointer. As far as I can tell the results are the same.

So basically: is there any reason why using 1 instead of 0 in this offsetof definition might not work? Can it still cause UB in certain cases, and if so: when and how? Basically, what I'm asking here is: Am I missing anything here?

Upvotes: 5

Views: 304

Answers (6)

supercat
supercat

Reputation: 81159

Nothing in any version of the C standard would forbid a compiler from doing anything it wanted with any macro that would attempt to achieve the effect without defining a storage location to hold the indicated object. Nonetheless, a form like:

#define offsetof(s, m) ((char*)&((((s)*)0)->m)-(char*)0)

would probably be pretty safe for pre-C99 compilers. Note that it generates an integer by subtracting one char* from another. That is specified to work and yield the a constant value when the pointers access parts of the same valid object, and will in practice work on any compiler which doesn't notice that a null pointer isn't a valid object. By contrast, the effect of casting a pointer to an integer or vice versa will vary on different platforms and there are many platforms where (int)(((char*)&foo)+1) - (int)(char*)&foo may not yield 1.

Note also that the meaning of "Undefined Behavior" has changed recently. It used to be that Undefined Behavior meant that the specification didn't say what compilers had to do, but most compilers would generally choose (sometimes arbitrarily) behavior that was mathematically correct or would make sense on the underlying platform. For example, on a 32-bit processor, int32_t foo=2147483647; foo+=(unsigned char)x; if (foo > 100) ... a compiler might determine that for any possible value of x the mathematically-correct value assigned to foo would be in the range 2147483647 to 2147483903, and thus greater than 100 in any case. Or it might perform the operation using two's-complement arithmetic and perform the comparison on a possibly-wrapped-around value. Newer compilers, however, may do something even more interesting.

A new compiler may look at an expression like the example with foo and infer that if x is zero then foo must remain 2147483647, and if x is non-zero the compiler would be allowed to do whatever it likes, so it may infer that as a consequence that the LSB of x must equal zero when the statement is executed, so if the code is preceded by a test for (unsigned char)x==0, that expression would always be true. Given code like the offsetof macro, which would generate Undefined Behavior regardless of the values of any variables, a compiler would be entitled to eliminate not just any code using it, but also any preceding code which could not by any defined means cause program execution to terminate.

Note that casting a non-zero integer literal to a pointer only Undefined Behavior if there does not exist any object whose address has been taken and cast to an integer so as yield that same value. Thus, a compiler would not be able to recognize a variant of the pointer-difference-based offsetof macro which cast some non-zero value to a pointer as exhibiting Undefined Behavior unless it could determine that the number in question did not correspond to any pointer. On the other hand, an attempt to cast a non-zero integer to a pointer would on some systems perform a validation check to ensure that the pointer is valid; such a system may then trap if it isn't.

Upvotes: 0

2501
2501

Reputation: 25752

One problem is that your created pointer does not point to an object.

6.2.4 Storage durations of objects

  1. The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 33) and retains its last-stored value throughout its lifetime. 34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.

and

J.2 Undefined behaviour
- The value of a pointer to an object whose lifetime has ended is used (6.2.4).

3.19.2 indeterminate value: either an unspecified value or a trap representation

When you convert 1 to a pointer, and the created pointer does not point to an object, the value of the pointer becomes indeterminate. You then use the pointer. Both of those cause undefined behavior.

The conversion of an integer to a pointer is also problematic:

6.3.2.3 Pointers

  1. An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. 67)

Upvotes: 2

ouah
ouah

Reputation: 145839

Both definitions are undefined behavior: in the first definition a null pointer is dereferenced and in your second definition you are dereferencing an invalid pointer (the pointer does not point to a valid object). It is not possible in C to write a portable version of offsetof macro.

Defect Report #44 says:

"In particular, this is why the offsetof macro exists: there was otherwise no portable means to compute such translation-time constants."

(DR#44 is for C89 but nothing has changed in the language in C99 and C11 that would allow a portable implementation.)

Upvotes: 3

haccks
haccks

Reputation: 106012

The implementation of offsetof with dereferencing a NULL pointer invokes undefined behavior. In this implementation it is assumed that the hypothetical structure begins at address 0. You may assume it to be 1, and yes it will invoke UB too because you are dereferencing a null pointer, but because an uninitialized pointer is dereferenced.

Upvotes: 0

user4098326
user4098326

Reputation: 1742

I believe the behaviour is implementation-defined. In 6.3.2.3 of n1256:

5 An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

Upvotes: 3

Zebra North
Zebra North

Reputation: 11492

You're not actually dereferencing the pointer, what you're doing is more akin to pointer addition, so using zero should be fine.

Upvotes: -2

Related Questions