Reputation: 7329
I'm following along with a Youtube Computerphile buffer overflow tutorial to learn how it works. The tutorial says its in Kali, and I'm running Kali 64-bit to test it (I think he's running 32-bit).
He writes a simple program like this:
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv) {
char buffer[500];
strcpy(buffer, argv[1]);
return 0;
}
Then after starting the program in GDB he runs:
(gdb) run $(python -c 'print "\x41" * 506')
and the result is a seg fault which shows that the return address was half overwritten with two 41's.
When I try to duplicate this, I need to change 506 to 522 in order to produce the same result. So my questions are:
Why does 506 only rewrite two bytes instead of three when he runs it?
Why do I need to write 522 bytes to overwrite 2 bytes in the return address? I think it has to do with him probably using 32-bit instead of 64-bit Kali, but I don't really understand how this difference adds up mathematically.
When I do disassemble main
I see that after the function prologue is the instruction sub
rsp, 0x210
, so it looks like buffer is allocated to 528 bytes. Why
this number in particular (his instead subs 0x1f4 which is exactly 500) and how does it relate to the above where greater than 520 bytes is needed to start rewriting the instruction pointer?
What is happening in the range of writing [500,520] bytes where it's more than the buffer size, but not yet writing over top of the instruction pointer?
Upvotes: 1
Views: 338
Reputation: 36362
A variation of this question is asked every month or so.
The thing is quite simple: Writing over a buffer's boundary leads to undefined behaviour, which might or might not involve a segmentation fault and overwriting any particular structure in memory.
The assumption that you make is that there's a mandatory memory layout that everyone uses, and that's simply not true, even less so with techniques like address space randomization, or compiler optimizations.
Hell, why should a main
function store a traditional return address? It could probably be very well in-lined in the system-/compiler-/binary format-specific startup code.
If the compiler is clever, it'll even notice that argv[1]
is only accessed by strcpy
, which copies it to buffer – and then, considering nothing will access the address space at argv[1]
anymore after main
, will simply not allocate anything for buffer and simply use &(argv[1])
instead. And since that is used nowhere, your main()
would be empty but for the return 0
, a const expression, and hence the call to main could be replaced with writing 0 to eax
or whatever your platform uses for return values.
Hate to tell you this, but: aside from pointing out that there can be, in fact, buffer overflows, it only gives something that works on a specific machine with a specific compiler version compiling a specific piece of code with a specific libc for a specific architecture. The result cannot be generalized.
Upvotes: 8