Franc
Franc

Reputation: 450

Why local variables have undetermined values in C if not initialized?

In C - Linux OS, when a function is called the epilogue portion of Assembly creates a stack frame and the local variables are in reference to base pointers. My question is that what makes the variable hold undetermined values when we print the variable without initializing. My theory is that when we make use of the variable, the OS brings the page corresponding to the local variable's address and the address in that page may have some value that makes the value of the local variable. Is that correct?

Upvotes: 0

Views: 89

Answers (2)

Shambhav
Shambhav

Reputation: 853

Let's look at the disassembly of a simple program:

#include <stdio.h>

int main() {
    unsigned int i;
    unsigned int j = 1;
    printf("%u\n", j);
    printf("%u\n", i);
}

The disassembly, with GCC-11.1 on default optimisation is:

    .file   "char.c"
    .text
    .section    .rodata
.LC0:
    .string "%u\n"
    .text
    .globl  main
    .type   main, @function
/*So, till here is meta data and other stuff. We're interested in what's bottom*/

main:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    $1, -8(%rbp)
    movl    -8(%rbp), %eax /*See, it wrote 1 into -8(%rbp), which
represents the variable j, but didn't assign anything anything to
 -4(%rbp), which represents the variable i*/
    movl    %eax, %esi
    leaq    .LC0(%rip), %rax
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf@PLT
    movl    -4(%rbp), %eax /* Now we load -4(%rbp), which is i, into
 %eax for printing. Whatever is at -4(%rbp) gets printed. So, it's
 undetermined */
    movl    %eax, %esi
    leaq    .LC0(%rip), %rax
    movq    %rax, %rdi
    movl    $0, %eax
    call    printf@PLT
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 11.1.0-3ubuntu1) 11.1.0"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align 8
    .long   1f - 0f
    .long   4f - 1f
    .long   5
0:
    .string "GNU"
1:
    .align 8
    .long   0xc0000002
    .long   3f - 2f
2:
    .long   0x3
3:
    .align 8
4:

Read the comments in the disassembly for explanation.

Apparently, the compiler might not even bother to load unintialised variables into the register in some cases(not in this one, might depend on compiler, optimisation and situation) and instead, just use whatever is in the register. I once saw someone say this, I haven't checked the ISO standard and haven't verified it. How do you even start finding such things in the standard? It's huge.

Upvotes: 1

Eric Postpischil
Eric Postpischil

Reputation: 222933

Consider the compiler compiling a program that correctly initializes an object:

int x = 3;
printf("%d\n", x);
int y = 4+x*7;
printf("%d\n", y);

This might result in assembly code:

Store 3 in X.                   // "X" refers to the stack location assigned for x.
Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Load from X into R1.            // R1 is the register for the second argument.
Call printf.
Load 4 into R1.                 // Start the 4 of 4+x*7.
Load from X into R2             // Get x to calculate with it.
Multiply R2 by 7.               // Make x*7.
Add R2 to R1.                   // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.

This is a working program. Now suppose we do not initialize x and have int x; instead. Since x is not initialized, the rules say it does not have a determined value. This means the compiler is allowed to omit all the instructions that get the value of x. So let’s take the working assembly code and remove all the instructions that get the value of x:

Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Call printf.
Load 4 into R1.                 // Start the 4 of 4+x*7.
Multiply R2 by 7.               // Make x*7.
Add R2 to R1.                   // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.

In this program, the first printf prints whatever was in R1, because the value of x was never loaded into R1. And the calculation of x*7 uses whatever is in R2, because the value of x was never loaded into R2. So this program might print, say, “37” for the first printf, because there happened to be a 37 in R1, but it might print, say “4” for the second printf, because there happened to be a 0 in R2. So the output of this program “looks like” x had the value 37 at one moment and the value 0 at another. The program behaves as if x does not have any fixed value.

This is a very simplified example. Practically, when a compiler is removing code during optimization, it would remove more. For example, if it knows x is not initialized, it might not remove only the load of x but also the multiply by 7. However, this example serves to demonstrate the principle: When there is an uninitialized value, the compiler can radically change the code that is generated.

Upvotes: 0

Related Questions