morimn
morimn

Reputation: 605

Does reading an uninitialized malloc() memory invoke Undefined Behaviors?

I know this is a really basic question, and there may be a duplicate, but I couldn't find a strict answer to this specific question which refers to the Standard. (I saw some say it's UB, others say not)

If I allocate a block of memory without filling data into it,

int* ptr = malloc(10 * sizeof(int));

and then try to read it, the values there will be garbage.

But is this classified as an Undefined Behavior? Or is it just bad but at least not a UB?

Upvotes: 2

Views: 1190

Answers (3)

Eric Postpischil
Eric Postpischil

Reputation: 222753

Summary

The behavior of reading uninitialized memory provided by malloc is not undefined per se. It can result in undefined behavior if memory containing a trap representation is read with a non-character type, but this can occur only if the type has a trap representation. (Most modern C implementations do not have trap representations for integer types.)

However, while it is not fully undefined, neither is it fully defined. Attempting to read uninitialized memory is not required to actually read the memory.

Details

C 2018 7.22.3.4 2 says, of the malloc function with parameter size:

The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

C 3.19.2 1 defines indeterminate value as:

either an unspecified value or a trap representation

C 3.19.3 1 defines unspecified value as:

valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance

Nothing in this makes the behavior undefined.

The behavior of reading a trap representation with a non-character type is not defined by the C standard, per 6.2.6.1 5. So, if the memory is read with a type that has a trap representation, and the resulting bits happen to contain values that represent a trap, then the behavior is undefined.

Trap representations in integer types are rare in modern C implementations. Many years ago, some systems would reserve certain bit patterns, such as the 16-bit 800016, to represent uninitialized or invalid data, and attempting to use such a value in arithmetic would generate a trap. In a C implementation without trap representations in some type T, accessing uninitialized data through type T cannot encounter a trap representation. So the result must be an unspecified (and hence valid) value of the type.

Further, there is nothing else in the C standard that would make this behavior undefined. There is a rule in 6.3.2.1 2 that accessing an uninitialized object of automatic storage duration has undefined behavior if its address is not taken. However, the memory provided by malloc has allocated storage duration, not automatic. (That rule is an accommodation to certain Hewlett-Packard hardware with the capability of marking a register as uninitialized and trapping when it is used.)

Also, whole structures and unions are never trap representations, regardless of the types of their members. The most common trap representation in modern C implementations is a floating-point signaling NaN (Not a Number).

Note that the value in the allocated memory is unspecified, and the definition above states “this document imposes no requirements on which value is chosen in any instance.” That means if you do this:

unsigned *p = malloc(sizeof *p);
printf("%u\n", *p);
printf("%u\n", *p);

the C standard imposes no requirement on which value is chosen for *p in the first printf and no requirement on which value is chosen in the second printf, not even a requirement that they be the same as each other. An “unspecified value” may act like it has bits that are changing by themselves from moment to moment. So, the behavior is not undefined—it cannot allow “anything” to happen to your program; your program cannot suddenly jump to a different function or wipe out other data—but neither is it defined to act like the memory has bits with fixed values.

This means you cannot reliably read the uninitialized memory—reads of the memory are not guaranteed to produce the bits that are actually in physical memory.

Discussion

To see why the C standard allows the program to act like the bits in memory may be changing, consider this code:

unsigned a = *p + 3;
unsigned b = *p + 4;

For that code in normal situations, the compiler might generate assembly like this:

// As we start, registers r7, r8, and r9 already contain p,
// the address of a, and the address of b, respectively.
load  r3, (r7) // Get value of *p from memory.
add   r3, #3   // Add 3.
store r3, (r8) // Store sum to a.
load  r3, (r7) // Get value of *p from memory.
add   r3, #4   // Add 4.
store r3, (r9) // Store sum to b.

If the memory p points to happened to contain 0, then these instructions would store 3 in a and 4 in b. However, the rule that uninitialized memory is not required to behave as if it had a fixed value means the compiler’s optimizer is allowed to eliminate the load instructions. Hypothetically, that could result in instructions such as:

add   r3, #3   // Add 3.
store r3, (r8) // Store sum to a.
add   r3, #4   // Add 4.
store r3, (r9) // Store sum to b.

If r3 happens to contain 0 when this code sequence starts, then 3 will be stored in a, and 7 will be stored in b. There is no possible value *p could have that would result in *p + 3 being 3 and *p + 4 being 7. So this code acts as if *p has changed by itself.

In practice, optimization would not just remove the load instructions here and not also recognize the subsequent instructions are also disconnected from fixed values and remove them. However, real-world optimizations get more complex than this. The license granted by the C standard allows the compiler to remove the parts of the code that it can figure out are not using defined values, even if it cannot figure out everything about the program.

Upvotes: 7

supercat
supercat

Reputation: 81159

Because there are some situations where it may be useful for an implementation to guarantee that repeated reads of malloc() storage will yield consistent values unless or until it is written, but also some situations where it may be useful for an implementation not to be bound by such a guarantee, the question of whether any particular implementation should offer such a guarantee is a Quality of Implementation outside the Standard's jurisdiction.

As to whether a read of such storage may trigger side effects beyond yielding a meaningless value, that's not really clear. Certainly it may be useful for a diagnostic implementation to trap on such reads, but the Standard doesn't explicitly provide for such things. On the other hand, I don't think the Standard explicitly states that storage returned from malloc won't behave as though arbitrary objects have been written to it, thus causing such storage to be associated with arbitrary Effective Types. Such questions again boil down to Quality of Implementation issues outside the Standard's jurisdiction.

Upvotes: -1

Paul Ogilvie
Paul Ogilvie

Reputation: 25286

Yes, you can read it.

The memory manager gave you the memory through your call to malloc so now it is yours and you can do anything you want with it. That includes reading it, e.g. printing it as an array of ints.

Eric discusses that the compiler may optimize away instructions, however, his discussion is based on the fact that the compiler knows about malloc and that it returns "uninitialized" memory, that is, memory that the user program did not assign a value to.

But in essence, malloc is just a function that returns an object and the compiler must assume the object returned has meaningful values, just as when a user function returns an object. Neither can there be "in-band" trap values as any value the memory has can be a legal value (all 32 bits of a 32 bit int are used). There can only be out-of-band trap values, that is, additional hardware like an extra hardware bit. But then trap values have become run-time.

Should the user have used realloc in which an existing block of memory (even a "no block" of memory) is expanded, then the compiler, even knowing about realloc, cannot assume anything about the returned object and cannot optimize instructions away. There is just nothing the compiler can assume.

Note that the memory manager may have set the memory to some unspecified value to prevent a program from reading data from the memory left by other uses, as a security measure.

Upvotes: 0

Related Questions