Reputation: 450
In C
- Linux OS
, when a function is called the epilogue portion of Assembly creates a stack frame and the local variables are in reference to base pointers. My question is that what makes the variable hold undetermined values when we print the variable without initializing. My theory is that when we make use of the variable, the OS
brings the page
corresponding to the local variable's address and the address in that page
may have some value that makes the value of the local variable. Is that correct?
Upvotes: 0
Views: 89
Reputation: 853
Let's look at the disassembly of a simple program:
#include <stdio.h>
int main() {
unsigned int i;
unsigned int j = 1;
printf("%u\n", j);
printf("%u\n", i);
}
The disassembly, with GCC-11.1 on default optimisation is:
.file "char.c"
.text
.section .rodata
.LC0:
.string "%u\n"
.text
.globl main
.type main, @function
/*So, till here is meta data and other stuff. We're interested in what's bottom*/
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movl $1, -8(%rbp)
movl -8(%rbp), %eax /*See, it wrote 1 into -8(%rbp), which
represents the variable j, but didn't assign anything anything to
-4(%rbp), which represents the variable i*/
movl %eax, %esi
leaq .LC0(%rip), %rax
movq %rax, %rdi
movl $0, %eax
call printf@PLT
movl -4(%rbp), %eax /* Now we load -4(%rbp), which is i, into
%eax for printing. Whatever is at -4(%rbp) gets printed. So, it's
undetermined */
movl %eax, %esi
leaq .LC0(%rip), %rax
movq %rax, %rdi
movl $0, %eax
call printf@PLT
movl $0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 11.1.0-3ubuntu1) 11.1.0"
.section .note.GNU-stack,"",@progbits
.section .note.gnu.property,"a"
.align 8
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 8
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 8
4:
Read the comments in the disassembly for explanation.
Apparently, the compiler might not even bother to load unintialised variables into the register in some cases(not in this one, might depend on compiler, optimisation and situation) and instead, just use whatever is in the register. I once saw someone say this, I haven't checked the ISO standard and haven't verified it. How do you even start finding such things in the standard? It's huge.
Upvotes: 1
Reputation: 222933
Consider the compiler compiling a program that correctly initializes an object:
int x = 3;
printf("%d\n", x);
int y = 4+x*7;
printf("%d\n", y);
This might result in assembly code:
Store 3 in X. // "X" refers to the stack location assigned for x.
Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Load from X into R1. // R1 is the register for the second argument.
Call printf.
Load 4 into R1. // Start the 4 of 4+x*7.
Load from X into R2 // Get x to calculate with it.
Multiply R2 by 7. // Make x*7.
Add R2 to R1. // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.
This is a working program. Now suppose we do not initialize x
and have int x;
instead. Since x
is not initialized, the rules say it does not have a determined value. This means the compiler is allowed to omit all the instructions that get the value of x
. So let’s take the working assembly code and remove all the instructions that get the value of x
:
Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Call printf.
Load 4 into R1. // Start the 4 of 4+x*7.
Multiply R2 by 7. // Make x*7.
Add R2 to R1. // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.
In this program, the first printf
prints whatever was in R1
, because the value of x
was never loaded into R1
. And the calculation of x*7
uses whatever is in R2
, because the value of x
was never loaded into R2
. So this program might print, say, “37” for the first printf
, because there happened to be a 37 in R1
, but it might print, say “4” for the second printf
, because there happened to be a 0 in R2
. So the output of this program “looks like” x
had the value 37 at one moment and the value 0 at another. The program behaves as if x
does not have any fixed value.
This is a very simplified example. Practically, when a compiler is removing code during optimization, it would remove more. For example, if it knows x
is not initialized, it might not remove only the load of x
but also the multiply by 7. However, this example serves to demonstrate the principle: When there is an uninitialized value, the compiler can radically change the code that is generated.
Upvotes: 0