Reputation: 65
Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main
:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts@plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]
).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf
function in the source C code.
So essentially, its loading the effective address of rip+0xa2
(0x5555555546f4
) into the register rdi
, to simply track where it will jump to for the printf
function?
2nd Question: What is the purpose of line 11? (mov eax,0x0
?)
I do not see a prior use of the register, EAX
and am not sure why it needs to be set to 0.
Upvotes: 5
Views: 1358
Reputation: 69357
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA
, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts()
needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts()
is used instead of printf()
because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000
. The string to use is stored in your binary at offset 0x6f4
. Since the next instruction is at offset 0x652
, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652)
= rip + 0xa2
, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main()
. In Intel x86, the calling convention is to return values in the rax
register (eax
if the value is 32 bits, which is true in this case since main
returns an int
). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return
statement, main()
is a special function, and the compiler will add a default return 0
for you.
Upvotes: 3
Reputation: 364947
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2
) is because the .rodata
section gets linked into the same ELF segment as .text
, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address
in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0
implements the implicit return 0
at the end of main
that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2
or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax
).
Upvotes: 9
Reputation: 67713
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main
by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk
Upvotes: 0