Reputation: 1163

Would reinterpreting data be undefined behavior?

Someone recently brought it up that this:

uint8_t a = 0b10000000;
int8_t b = *(int8_t*) &a;

is undefined behavior, because the value of a is outside of what I can represent in int8_t. Can someone explain why exactly this is undefined behavior?

My main issue is that the memory is there, and is valid as the memory for int8_t, the only difference is that int8_t will interpret that byte as -128, while uint8_t will interpret it as 128. I am further confused by this because the fast inverse square root uses:

float y =  /* Some val*/;
int32_t i  = * ( int32_t * ) &y;

This will give a value of i in essence unrelated (apart from the IEEE standard) to y, so I don't see why reinterpreting a piece of memory could be undefined behavior.

Upvotes: 9

Answers (3)

supercat

Reputation: 81217

Rather than trying to define all of the behaviors necessary to accomplish every plausible task, the authors of the C and C++ Standards instead allow implementations to support various useful behaviors or not, at their leisure, on the presumption that compiler writers will be able to know and support their customers' needs far better than the Committee ever could.

If one is targeting a platform where all pointers are the same size and have the same representation (true of nearly all implementations for current processor and controller designs), one ensures that any pointer used to access an object of a particular type satisfies the platform's alignment requirements for that type (true if the pointer is a multiple of the size of the largest primitive), and one uses a compiler configuration that is specified to support straightforward type punning patterns (e.g. -fno-strict-aliasing on clang or gcc), then type punning code will work as expected on that compiler configuration. Such code will not be portable to all other implementations or configurations, but portability is just one factor upon which the quality of code should be judged. If code will run efficiently and correctly on all C implementations where it will be used, replacing it with code that is slower and/or harder to read purely for purposes of making it "portable" would not be an improvement.

Incidentally, every compiler configuration I've tested either uses an abstraction model that supports useful type-punning constructs beyond those mandated by the Standards, or fails to uphold all of the memory-recycling constructs for which the Standard mandates support. It would be impossible for a compiler to behave as specified in all cases where the Standard defines behavior without also behaving in a fashion consistent with writing and reading object representations in many cases where the Standard imposes no requirements; presumably the authors of the Standard expected compilers to accommodate that difficulty by behaving usefully in more cases than required by the Standard, but when optimizations are enabled, clang and gcc prioritize "optimization" over correctness.

Upvotes: 0

Lala5th

Reputation: 1163

Thanks for all the comments. I went down a rabbit hole of strict aliasing and found that the fast inverse square root is undefined behavior, despite my beliefs, but my initial code does not seem to be. Not because uint8_t is special, but as the standard has a rule for signed/unsigned interchange it:

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined: [...] (11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object

So there is no issue in theory, as uint8_t is the unsigned type of int8_t

Upvotes: 8

Fatih BAKIR

Reputation: 4725

The problem is not the reinterpretation of data, but the reinterpretation of the pointer. This is problematic for due to the following, non-exhaustive list of reasons:

The standard does not require that all pointers be the same size, so sizeof(float*) does not have to be sizeof(int*), so the conversion may just lose data.
If you grab a uint32_t* from a float* and read from it, you would be reading a uint32_t that was never created.
As you said, compilers assume two pointers of different types (except unsigned char*) never alias, and perform optimizations with this information.

However, sometimes converting between bit representation of unrelated types is a legit requirement. Traditionally, this has been done using memcpy, but C++20 added std::bit_cast, able to do this reinterpretation even in constexpr, so the following is legal, and expresses our intention directly:

constexpr float pi = 3.14f;
constexpr uint32_t pi_bits = std::bit_cast<uint32_t>(pi);

Upvotes: 1

Would reinterpreting data be undefined behavior?

Answers (3)

Related Questions