MiterV
MiterV

Reputation: 65

Why does initializing a variable `i` to 0 and to a large size result in the same size of the program?

There is a problem which confuses me a lot.

int main(int argc, char *argv[])
{
    int i = 12345678;
    return 0;
}

int main(int argc, char *argv[])
{
    int i = 0;
    return 0;
}

size

The programs have the same bytes in total. Why?

And where the literal value indeed stored? Text segment or other place?

memory map

Upvotes: 6

Views: 224

Answers (4)

Enzo Ferber
Enzo Ferber

Reputation: 3104

TL;DR

First question: They're the same size because the instructions output of your program are roughly the same (more on that below). Further, they're the same size because the size(number of bytes) of your ints never change.

Second question: i variable is stored in your local variables frame which is in the function stack. The actual value you set to i is in the instructions (hardcoded) in the text-segment.


GDB and Assembly

I know you're using Windows, but consider these codes and output on Linux. I used the exactly same sources you posted.

For the first one, with i = 12345678, the actual main function is these computer instructions:

(gdb) disass main
Dump of assembler code for function main:
   0x00000000004004ed <+0>: push   %rbp
   0x00000000004004ee <+1>: mov    %rsp,%rbp
   0x00000000004004f1 <+4>: mov    %edi,-0x14(%rbp)
   0x00000000004004f4 <+7>: mov    %rsi,-0x20(%rbp)
   0x00000000004004f8 <+11>:movl   $0xbc614e,-0x4(%rbp)
   0x00000000004004ff <+18>:mov    $0x0,%eax
   0x0000000000400504 <+23>:pop    %rbp
   0x0000000000400505 <+24>:retq   
End of assembler dump.

As for the other program, with i = 0, main is:

(gdb) disass main
Dump of assembler code for function main:
   0x00000000004004ed <+0>: push   %rbp
   0x00000000004004ee <+1>: mov    %rsp,%rbp
   0x00000000004004f1 <+4>: mov    %edi,-0x14(%rbp)
   0x00000000004004f4 <+7>: mov    %rsi,-0x20(%rbp)
   0x00000000004004f8 <+11>:movl   $0x0,-0x4(%rbp)
   0x00000000004004ff <+18>:mov    $0x0,%eax
   0x0000000000400504 <+23>:pop    %rbp
   0x0000000000400505 <+24>:retq   
End of assembler dump.

The only difference between both codes is the actual value being stored in your variable. Lets go in a step by step trough these lines bellow (my computer is x86_64, so if your architecture is different, instructions may differ).


OPCODES

And the actual instructions of main (using objdump):

00000000004004ed <main>:
  4004ed:   55                      push   %rbp
  4004ee:   48 89 e5                mov    %rsp,%rbp
  4004f1:   89 7d ec                mov    %edi,-0x14(%rbp)
  4004f4:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
  4004f8:   c7 45 fc 4e 61 bc 00    movl   $0xbc614e,-0x4(%rbp)
  4004ff:   b8 00 00 00 00          mov    $0x0,%eax
  400504:   5d                      pop    %rbp
  400505:   c3                      retq   
  400506:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40050d:   00 00 00 

To get the actual difference of bytes, using objdump -D prog1 > prog1_dump and objdump -D prog2 > prog2_dump and them diff prog1_dump prog2_dump:

2c2
< draft1:     file format elf64-x86-64
---
> draft2:     file format elf64-x86-64
51,58c51,58
<   400283: 00 bc f6 06 64 9f ba    add    %bh,-0x45609bfa(%rsi,%rsi,8)
<   40028a: 01 3b                   add    %edi,(%rbx)
<   40028c: 14 d1                   adc    $0xd1,%al
<   40028e: 12 cf                   adc    %bh,%cl
<   400290: cd 2e                   int    $0x2e
<   400292: 11 77 5d                adc    %esi,0x5d(%rdi)
<   400295: 79 fe                   jns    400295 <_init-0x113>
<   400297: 3b                      .byte 0x3b
---
>   400283: 00 e8                   add    %ch,%al
>   400285: f1                      icebp  
>   400286: 6e                      outsb  %ds:(%rsi),(%dx)
>   400287: 8a f8                   mov    %al,%bh
>   400289: a8 05                   test   $0x5,%al
>   40028b: ab                      stos   %eax,%es:(%rdi)
>   40028c: 48 2d 3f e9 e2 b2       sub    $0xffffffffb2e2e93f,%rax
>   400292: f7 06 53 df ba af       testl  $0xafbadf53,(%rsi)
287c287
<   4004f8: c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
---
>   4004f8: c7 45 fc 4e 61 bc 00    movl   $0xbc614e,-0x4(%rbp)

Note on address 0x4004f8 your number is there, 4e 61 bc 00 on prog2 and 00 00 00 00 on prog1, both 4 bytes which is equal to sizeof(int). The bytes c7 45 fc are the rest of the instructions (move some value into an offset of rbp). Also note that the first two sections that differ have the same size in bytes (21). So, there you go, although slightly different, they're the same size.


Step by step through Assembly Instructions

  1. push %rbp; mov %rsp, %rbp: This is called setting up the Stack Frame, and is standard for all C functions (unless you tell gcc -fomit-frame-pointer). This enables you to access the stack and your local variables through a fixed register, in this case, rbp.

  2. mov %edi, -0x14(%rbp): This moves the content of register edi into our local variables frame. Specifically, into offset -0x14

  3. mov %rsi, -0x20(%rbp): Same here. But this time it saves rsi. This is part of the x86_64 calling convention (which now uses registers instead of pushing everything on stack like x86_32), but instead of keeping them in registers, we free the registers by saving the contents in our local variables frame - register are way faster and are the only way the CPU can actually process anything, so the more free registers we have, the better.

Note: edi is the 4-bytes part of the rsi register and from the x86_64 calling convention, we know that rsi register is used for the first argument. main's first argument is int argc, so it makes sense we use a 4-byte register to store it. rsi is the second argument, effectively the address of a pointer to pointer to chars (**argv). So, in 64bit architectures, that fits perfectly into a 64bit register.

  1. <+11>: movl $0xbc614e,-0x4(%rbp): This is the actual line int i = 12345678 (0xbc614e = 12345678d). Now, note that the way we "move" that value is very similar to how we stored the main arguments. We use offset -0x4(%rbp) to store it memory, on the "local variables frame" (this answers your question on where it gets stored).

  2. mov $0x0, %eax; pop %rbp; retq: Again, dull stuff to clear up the frame pointer and return (end the program since we're in main).

  3. Note that on the second example, the only difference is the line <+11>: movl $0x0,-0x4(%rbp), which effectively stores the value zero - in C words, int i = 0.

So, by these instructions you can see that the main function of both programs gets translated to assembly in the exact the same way, so their sizes are the same in the end. (Assuming you compiled them the same way, because the compiler also puts lots of other things in the binaries, like data, library functions, etc. In linux, you can get a full disassembly dump using objdump -D program.

Note 2: In these examples, you cannot see how the computer subtracts values from rsp in order to allocate stack space, but that's how it's normally done.


Stack Representation

The stack would be like this for both cases (only the value of i would change, or the value at -0x4(%rbp))

|        ~~~       |  Higher Memory addresses 
|                  |
+------------------+ <--- Address 0x8(%rbp)
| RETURN ADDRESS   | 
+------------------+ <--- Address 0x0(%rbp) // instruction push %rbp
| previous rbp     | 
+------------------+ <--- Address -0x4(%rbp)
| i=0x11223344     | 
+------------------+ <---- Address -0x14(%rbp)
| argc             |
+------------------+  <---- address -0x20(%rbp)
| argv             | 
+------------------+
|                  |
+~~~~~~~~~~~~~~~~~~+ Lower memory addresses

Note 3: The direction to where the stack grows depends on your architecture. How data gets written in memory also depends on your architecture.


Resources

Upvotes: 1

machine_1
machine_1

Reputation: 4454

Consider your bedroom.if you filled it with stuff or you left it empty,does that change the area of your bedroom? the size of int is sizeof(int).it does not matter what value you store in it.

Upvotes: 5

Thomas Ayoub
Thomas Ayoub

Reputation: 29451

Because your program is optimized. At compile time, the compiler found out that i was useless and removed it.

If optimization didn't occurs, another simple explanation is that an int is the same size of another int.

Upvotes: 1

cadaniluk
cadaniluk

Reputation: 15229

The programs have the same bytes in total.Why?

There are two possibilities:

  1. The compiler is optimizing out the variable. It isn't used anywhere and therefore doesn't make sense.

  2. If 1. doesn't apply, the program sizes are equal anyway. Why shouldn't they? 0 is just as large in size as 12345678. Two variables of type T occupy the same size in memory.

And where the literal value indeed stored?

On the stack. Local variables are commonly stored on the stack.

Upvotes: 9

Related Questions