Joel
Joel

Reputation: 2175

Does std::memcpy make its destination determinate?

Here is the code:

unsigned int a;            // a is indeterminate
unsigned long long b = 1;  // b is initialized to 1
std::memcpy(&a, &b, sizeof(unsigned int));
unsigned int c = a;        // Is this not undefined behavior? (Implementation-defined behavior?)

Is a guaranteed by the standard to be a determinate value where we access it to initialize c? Cppreference says:

void* memcpy( void* dest, const void* src, std::size_t count );

Copies count bytes from the object pointed to by src to the object pointed to by dest. Both objects are reinterpreted as arrays of unsigned char.

But I don't see anywhere in cppreference that says if an indeterminate value is "copied to" like this, it becomes determinate.

From the standard, it seems it's analogous to this:

unsigned int a;            // a is indeterminate
unsigned long long b = 1;  // b is initialized to 1
auto* a_ptr = reinterpret_cast<unsigned char*>(&a);
auto* b_ptr = reinterpret_cast<unsigned char*>(&b);
a_ptr[0] = b_ptr[0];
a_ptr[1] = b_ptr[1];
a_ptr[2] = b_ptr[2];
a_ptr[3] = b_ptr[3];
unsigned int c = a;        // Is this undefined behavior? (Implementation defined behavior?)

It seems like the standard leaves room for this to be allowed, because the type aliasing rules allow for the object a to be accessed as an unsigned char this way. But I can't find something that says this makes a no longer indeterminate.

Upvotes: 11

Views: 562

Answers (3)

Joel
Joel

Reputation: 2175

TL;DR: It's implementation-defined whether there will be undefined behavior or not. Proof-style, with lines of code numbered:


  1. unsigned int a;

The variable a is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with default initialization, in which no other initialization is performed (9.3/7.3).

  1. unsigned long long b = 1ull;

The variable b is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with copy-initialization (9.3/15).

  1. std::memcpy(&a, &b, sizeof(unsigned int));

Per 16.2/2, std::memcpy should have the same semantics and preconditions as the C standard library's memcpy. In the C standard 7.21.2.1, assuming sizeof(unsigned int) == 4, 4 characters are copied from the object pointed to by &b into the object pointed to by &a. (These two points are what is missing from other answers.)

At this point, the sizes of unsigned int, unsigned long long, their representations (e.g. endianness), and the size of a character are all implementation defined (to my understanding, see 6.7.1/4 and its note saying that ISO C 5.2.4.2.1 applies). I will assume that the implementation is little-endian, unsigned int is 32 bits, unsigned long long is 64 bits, and a character is 8 bits.

Now that I have said what the implementation is, I know that a has a value-representation for an unsigned int of 1u. Nothing, so far, has been undefined behavior.

  1. unsigned int c = a;

Now we access a. Then, 6.7/4 says that

For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

I know now that the value of a is determined by the implementation-defined value bits in a, which I know hold the value-representation for 1u. The value of a is then 1u.

Then like (2), the variable c is copy-initialized to 1u.


We made use of implementation-defined values to find what happens. It is possible that the implementation-defined value of 1ull is not one of the implementation-defined set of values for unsigned int. In that case, accessing a will be undefined behavior, because the standard doesn't say what happens when you access a variable with a value-representation that is invalid.

AFAIK, we can take advantage of the fact that most implementations define an unsigned int where any possible bit pattern is a valid value-representation. Therefore, there will be no undefined behavior.

Upvotes: 1

Lemon Drop
Lemon Drop

Reputation: 2133

Note: I updated this answer since by exploring the issue further in some of the comments has reveled cases where it would be implementation defined or even undefined in a case I did not consider originally (specifically in C++17 as well).

I believe that this is either implementation defined behavior in some cases and undefined in others (as another answer came to conclude for similar reasons). In a sense it's implementation defined if it's undefined behavior or implementation defined, so I am not sure if it being undefined in general takes precedence in such a classification.

Since std::memcpy works entirely on the object representation of the types in question (by aliasing the pointers given to unsigned char as is specified by 6.10/8.8 [basic.lval]). If the bits within the bytes in question of the unsigned long long are guaranteed to be something specific then you can manipulate them however you wish or write them into the object representation of any other type. The destination type will then use the bits to form its value based on its value representation (whatever that may be) as is defined in 6.9/4 [basic.types]:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

And that:

The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

Knowing this, all that matters now is what the object representation of the integer types in question are. According to 6.9.1/7 [basic.fundemental]:

Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system. [Example: This International Standard permits two’s complement, ones’ complement and signed magnitude representations for integral types. — end example ]

A footnote does clarify the definition of "binary numeration system" however:

A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)

We also know that unsigned integers have the same value representation as signed integers, just under a modulus according to 6.9.1/4 [basic.fundemental]:

Unsigned integers shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.

While this does not say exactly what the value representation may be, based on the specified definition of a binary numeration system, successive bits are to be additive powers of two as expected (rather than allowing the bits to be in any given order), with the exception of the maybe present sign bit. Additionally since signed and unsigned value representations this means an unsigned integer will stored as an increasing binary sequence up until 2^(n-1) (past then depending on how signed number are handled things are implementation defined).

There are still some other considerations however, such as endianness and how many padding bits may be present due to sizeof(T) only measuring the size of the object representation rather than the value representation (as stated before). Since in C++17 there is no standard way (I think) to check for endianness, this is the main factor that would leave this to be implementation defined in what the outcome would be. As for padding bits, while they may be present (but not specified where they will be from what I can tell other than the implication that they will not interrupt the contiguous sequence of bits forming the value representation of a integer), writing to them can prove potentially problematic. Since the intent of the C++ memory model is based on the C99 standard's memory model in a "comparable" way, a footnote from 6.2.6.2 (which is referenced in the C++20 standard as a note to remind that it's based on that) can be taken which say as follows:

Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

This implies that writing directly to padding bits incorrectly could potentially generate a trap representation from what I can tell.

This shows that in some cases depending on if padding bits are present and endianness, the result can be influenced in an implementation-defined manner. If some combination of padding bits is also a trap representation, this may become undefined behavior.

While not possible in C++17, in C++20 one can use std::endian in conjunction with std::has_unique_object_representations<T> (which was present in C++17) or some math with CHAR_BIT, UINT_MAX/ULLONG_MAX and the sizeof those types to ensure the expected endianness is correct as well as the absence of padding bits, allowing this to actually produce the expected result in a defined manner given what was previously established with how integers are said to be stored. Of course C++20 also further refines this and specifies that integer are to be stored in two's complement alone eliminating further implementation-specific issues.

Upvotes: 0

Nicol Bolas
Nicol Bolas

Reputation: 473856

Is this not undefined behavior

It's UB, because you're copying into the wrong type. [basic.types]2 and 3 permit byte copying, but only between objects of the same type. You copied from a long long into an int. That has nothing to do with the value being indeterminate. Even though you're only copying sizeof(int) bytes, the fact that you're not copying from an actual int means that you don't get the protection of those rules.

If you were copying into the value of the same type, then [basic.types]3 says that it's equivalent to simply assigning them. That is, a " shall subsequently hold the same value as" b.

Upvotes: 4

Related Questions