Zebrafish
Zebrafish

Reputation: 13876

If this is undefined behavior then why is it given as a seemingly legitimate example?

In a Wikipedia article on type punning it gives an example of pointing an int type pointer at a float to extract the signed bit:

However, supposing that floating-point comparisons are expensive, and also supposing that float is represented according to the IEEE floating-point standard, and integers are 32 bits wide, we could engage in type punning to extract the sign bit of the floating-point number using only integer operations:

bool is_negative(float x) {
    unsigned int *ui = (unsigned int *)&x;
    return *ui & 0x80000000;
}

Is it true that pointing a pointer to a type not its own is undefined behavior? The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code? I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?

Upvotes: 2

Views: 320

Answers (5)

supercat
supercat

Reputation: 81159

When the C Standard characterizes an action as invoking Undefined Behavior, that implies that at least one of the following is true:

  1. The code is non-portable.
  2. The code is erroneous.
  3. The code is acting upon erroneous data.

One of the reasons for the Standard leaves some actions as Undefined is to, among other things, "identify areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." A common extension, listed in the Standard as one of the ways implementations may process constructs that invokes "Undefined Behavior", is to process some such constructs by "behaving during translation or program execution in a documented manner characteristic of the environment".

I don't think the code listed in the example claims to be 100% portable. As such, the fact that it invokes Undefined Behavior does not preclude the possibility of it being non-portable but correct. Some compiler writers believe that the Standard was intended to deprecate non-portable constructs, but such a notion is contradicted by both the text of the Standard and the published Rationale. According to the published Rationale, the authors of the Standard wanted to give programmers a "fighting chance" [their term] to write portable code, and defined a category of maximally-portable programs, but not not specify portability as a requirement for anything other than strictly conforming C programs, and they expressly did not wish to demean programs that were conforming but not strictly conforming.

Upvotes: 0

Eric Postpischil
Eric Postpischil

Reputation: 222689

Why

If this is undefined behavior then why is it given as a seemingly legitimate example?

This was a common practice before C was standardized and added the rules about aliasing, and it has unfortunately persisted in practice. Nonetheless, Wikipedia pages should not be offering it as examples.

Aliasing Via Pointer Conversions

Is it true that pointing a pointer to a type not its own is undefined behavior?

The rules are more complicated than that, but, yes, many uses of an object through an lvalue of a different type are not defined by the C or C++ standards, including this one. There are also rules about pointer conversions that may be violated.

The fact that many compilers support this behavior even though the C and C++ standards do not require them to is not a reason to do so, as there is a simple alternative defined by the standards (use memcpy, below).

Using Unions

In C, an object may be reinterpreted as another type using a union. C++ does not define this:

union { float f; unsigned int ui; } u = { .f = x };
unsigned int ui = u.ui;

or the new value may be obtained more tersely using a compound literal:

(union { float f; unsigned int ui; }) {x} .ui

Naturally, float and unsigned int should have the same size when using this.

Copying Bytes

Both C and C++ support reinterpreting an object by copying the bytes that represent it:

unsigned int ui;
memcpy(&ui, &x, sizeof ui);

Naturally, float and unsigned int should have the same size when using this. The above is C code; C++ requires std::memcpy or a suitable using declaration.

Upvotes: 1

user3025145
user3025145

Reputation: 11

Accessing data through pointers (or unions) seems pretty common in (embedded) c code but requires often extra knowledge.

  • If a float would be smaller then an int, you would be accessing outside defined space.
  • the code takes several assumptions on where and how the sign bit is stored (little vs big endian, 2s-complement)

Upvotes: 0

John Bollinger
John Bollinger

Reputation: 180201

Is it true that pointing a pointer to a type not its own is undefined behavior?

No, both C and C++ allow an object pointer to be converted to a different pointer type, with some caveats.

But with a few narrow exceptions, accessing the pointed-to object via the differently-typed pointer does have undefined behavior. Such undefined behavior arises from evaluating the expression *ui in the example function.

The article makes it seem as if this operation is a legitimate and common thing. What are the things that can possibly go wrong in this particular piece of code?

The behavior is undefined, so anything and everything within the power of the program to do is possible. In practice, the observed behavior might be exactly what the author(s) of the Wikipedia article expected, and if not, then the most likely misbehaviors are variations on the function computing incorrect results.

I'm interested in both C and C++, if it makes any difference. Both have the strict aliasing rule, right?

To the best of my knowledge, the example code has undefined behavior in both C and C++, for substantially the same reason.

Upvotes: 4

Brian Bi
Brian Bi

Reputation: 119174

The fact that it is technically undefined behaviour to call this is_negative function implies that compilers are legally allowed to "exploit" this fact, e.g., in the below code:

if (condition) {
    is_negative(bar);
} else {
    // do something
}

the compiler may "optimize out" the branch, by evaluating condition and then unconditionally proceeding to the else substatement even if the condition is true.

However, because this would break enormous amounts of existing code, "real" compilers are practically forced to treat is_negative as if it were legitimate. In legal C++, the author's intent is expressed as follows:

unsigned int ui;
memcpy(&ui, &x, sizeof(x));
return ui & 0x80000000;

So the reinterpret_cast approach to type punning, while undefined according to the standard in this case, is thought of by many people as "de facto implementation-defined" and equivalent to the memcpy approach.

Upvotes: 3

Related Questions