Reputation: 13
I am trying to understand how GDB works and how memory is being allocated. When I run the following command, it is suppose to write 72 A
's into memory, but when I counted in memory, it only writes 68 A
's. Then there's 4 bytes of some random memory before it writes memory of B. When I counted the A
's in the print statement, it shows 72 A
's.
0xbffff080: 0x14 0x84 0x04 0x08 0x41 0x41 0x41 0x41
0xbffff088: 0x42 0x42 0x42 0x42 0x42 0x42 0x42 0x42
Full command below.
(gdb) run $( python -c "print('A'*72+'BBBB')" )
Starting program: /home/ubuntu/Desktop/test $( python -c "print('A'*72+'BBBB')" )
Breakpoint 2, 0x08048473 in getName (
name=0xbffff32c 'A' <repeats 72 times>, "BBBB") at sample1.c:7
7 printf("Your name is: %s \n", myName);
(gdb) c
Continuing.
Your name is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB
Program received signal SIGSEGV, Segmentation fault.
0xbffff32c in ?? ()
(gdb) x/150xb $sp-140
0xbffff038: 0x50 0xf0 0xff 0xbf 0x54 0x82 0x04 0x08
0xbffff040: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff048: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff050: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff058: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff060: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff068: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff070: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff078: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff080: 0x14 0x84 0x04 0x08 0x41 0x41 0x41 0x41
0xbffff088: 0x42 0x42 0x42 0x42 0x42 0x42 0x42 0x42
0xbffff090: 0x2c 0xf3 0xff 0xbf 0x00 0xf0 0xff 0xb7
When I did further testing, and add an additional 4 bytes (4 C
's), it shows it properly in memory as well as in the print statement.
(gdb) run $( python -c "print('A'*72+'BBBB'+'CCCC')" )
Starting program: /home/ubuntu/Desktop/test $( python -c "print('A'*72+'BBBB'+'CCCC')" )
Breakpoint 2, 0x08048473 in getName (name=0xbffff300 "") at sample1.c:7
7 printf("Your name is: %s \n", myName);
(gdb) c
Continuing.
Your name is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCC
Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) x/150xb $sp-140
0xbffff02c: 0x54 0x82 0x04 0x08 0x41 0x41 0x41 0x41
0xbffff034: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff03c: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff044: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff04c: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff054: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff05c: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff064: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff06c: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
0xbffff074: 0x41 0x41 0x41 0x41 0x42 0x42 0x42 0x42
0xbffff07c: 0x43 0x43 0x43 0x43 0x00 0xf3 0xff 0xbf
0xbffff084: 0x00 0xf0 0xff 0xb7 0xab 0x84
Here is the code:
#include <stdio.h>
#include <string.h>
void getName (char* name) {
char myName[64];
strcpy(myName, name);
printf("Your name is: %s \n", myName);
}
int main (int argc, char* argv[]) {
getName(argv[1]);
return 0;
}
A disassembly of getName
which shows that 88 bytes were added to the buffer:
Reading symbols from test...done.
(gdb) disas getName
Dump of assembler code for function getName:
0x0804844d <+0>: push %ebp
0x0804844e <+1>: mov %esp,%ebp
0x08048450 <+3>: sub $0x58,%esp
0x08048453 <+6>: mov 0x8(%ebp),%eax
0x08048456 <+9>: mov %eax,0x4(%esp)
0x0804845a <+13>: lea -0x48(%ebp),%eax
0x0804845d <+16>: mov %eax,(%esp)
0x08048460 <+19>: call 0x8048320 <strcpy@plt>
0x08048465 <+24>: lea -0x48(%ebp),%eax
0x08048468 <+27>: mov %eax,0x4(%esp)
0x0804846c <+31>: movl $0x8048530,(%esp)
0x08048473 <+38>: call 0x8048310 <printf@plt>
0x08048478 <+43>: leave
0x08048479 <+44>: ret
End of assembler dump.
Upvotes: 0
Views: 219
Reputation: 47573
Unoptimized code may see extra padding on the stack because of inefficiencies, but most often padding is a result of the compiler trying to align data on the stack. GCC generally tries to allocate arrays on addresses evenly divisible by 16.
After EBP is pushed 0x58 bytes (88 bytes) are allocated. We can see that the buffer starts at EBP-0x48 because of this instruction:
lea -0x48(%ebp),%eax
The address EBP-0x48
is then used to set the parameters on the stack for both the call to strcpy
and printf
. 0x48 = 72 bytes, despite the buffer being 64 bytes. There are an additional 8 bytes of padding. Why the padding there? Because the compiler has tried to ensure that the beginning of the myName
buffer is on a 16 byte boundary.
GCC can keep track of what is on the stack, but an important piece of information about alignment is derived from the calling convention (64-bit System V ABI) that says upon a call to a function (in this case getName
) the stack must be 16 byte aligned. The call
instruction pushes 4 bytes for a return address and then EBP is pushed for an additional 4. The compiler knows after the PUSH EBP it is misaligned by 8 bytes. 64 + 8 bytes of padding + 4 for EBP + 4 return address = 80. 80 is evenly divisible by 16 (16*5=80). The use of 8 bytes wasn't arbitrary.
In the GDB output you can see the myName
array starts on a hexadecimal address ending in 0
. Any hexadecimal address that ends in 0
is evenly divisible by 16 and you can see the buffer starts at 0xbffff040:
0xbffff038: 0x50 0xf0 0xff 0xbf 0x54 0x82 0x04 0x08 0xbffff040: 0x41 0x41 0x41 0x41 0x41 0x41 0x41 0x41
With all that being said if you are looking to overwrite the return address it will be at an offset from the beginning of myName
that is equal to 64 (array size) + 8 (padding) + 4(EBP on stack) = 76 bytes. You will have to write 76 bytes of data before reaching the point where you can replace the return address.
Note: You may wonder why the myname
array has an additional 16 bytes beneath it on the stack (88-72=16 bytes). That space is where the compiler places values for the function calls like strcpy
and printf
and ensure that the function calls that are made have a stack 16 byte aligned to conform to the 64-bit System V ABI.
I confirmed the following observations by reproducing exactly what you saw on my own Ubuntu 14.04 system.
You were also wondering about the fact that when you inserted 72 A
's and 4 B
's that you had 4 unexpected bytes in the buffer:
0xbffff080:[0x14 0x84 0x04 0x08] 0x41 0x41 0x41 0x41 0xbffff088: 0x42 0x42 0x42 0x42 0x42 0x42 0x42 0x42
I've marked the 4 bytes with []
. You are right that you might expect those 4 bytes to be 0x41
(The letter A
) like the rest. What has happened is that although the input you gave on the command line was 76 characters (72+4) strcpy
appended a NUL(\0
) on the end as a 77th character. This overwrote the lower byte of the return address with 0! You used the c
command to continue running after the breakpoint. The debugger terminated when it hit a segmentation fault. What happened was the RET
instruction didn't return back to where you expected in main
, it returned to a slightly lower location in memory because of the NUL byte being written into the return address. It just so happened that what you didn't see was all the instructions that executed after the RET
that placed data back onto the stack. That included writing 32-bits of data into what was once your myName
array.
When you wrote 72 A
's, 4 B
's, and 4 C
's you ended up overwriting the return address with CCCC
and you got a segmentation fault when the RET
tried to start executing code at 0x43434343 as seen here:
0x43434343 in ?? ()
0x43434343 wasn't a valid address where you had execute permissions so it faulted. Because the RET
failed to execute any more code the program didn't have a chance to overwrite the myName
array. This explains why the buffer wasn't overwritten like the previous test.
Upvotes: 2