squinky
squinky

Reputation: 13

How is memory assigned C?

I am trying to understand how GDB works and how memory is being allocated. When I run the following command, it is suppose to write 72 A's into memory, but when I counted in memory, it only writes 68 A's. Then there's 4 bytes of some random memory before it writes memory of B. When I counted the A's in the print statement, it shows 72 A's.

0xbffff080: 0x14    0x84    0x04    0x08    0x41    0x41    0x41    0x41
0xbffff088: 0x42    0x42    0x42    0x42    0x42    0x42    0x42    0x42

Full command below.

(gdb) run $( python -c "print('A'*72+'BBBB')" )
Starting program: /home/ubuntu/Desktop/test $( python -c "print('A'*72+'BBBB')" )

Breakpoint 2, 0x08048473 in getName (
    name=0xbffff32c 'A' <repeats 72 times>, "BBBB") at sample1.c:7
7       printf("Your name is: %s \n", myName);
(gdb) c
Continuing.
Your name is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB 

Program received signal SIGSEGV, Segmentation fault.
0xbffff32c in ?? ()
(gdb) x/150xb $sp-140
0xbffff038: 0x50    0xf0    0xff    0xbf    0x54    0x82    0x04    0x08
0xbffff040: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff048: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff050: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff058: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff060: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff068: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff070: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff078: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff080: 0x14    0x84    0x04    0x08    0x41    0x41    0x41    0x41
0xbffff088: 0x42    0x42    0x42    0x42    0x42    0x42    0x42    0x42
0xbffff090: 0x2c    0xf3    0xff    0xbf    0x00    0xf0    0xff    0xb7

When I did further testing, and add an additional 4 bytes (4 C's), it shows it properly in memory as well as in the print statement.

(gdb) run $( python -c "print('A'*72+'BBBB'+'CCCC')" )
Starting program: /home/ubuntu/Desktop/test $( python -c "print('A'*72+'BBBB'+'CCCC')" )

Breakpoint 2, 0x08048473 in getName (name=0xbffff300 "") at sample1.c:7
7       printf("Your name is: %s \n", myName);
(gdb) c
Continuing.
Your name is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCC 

Program received signal SIGSEGV, Segmentation fault.
0x43434343 in ?? ()
(gdb) x/150xb $sp-140
0xbffff02c: 0x54    0x82    0x04    0x08    0x41    0x41    0x41    0x41
0xbffff034: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff03c: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff044: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff04c: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff054: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff05c: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff064: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff06c: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41
0xbffff074: 0x41    0x41    0x41    0x41    0x42    0x42    0x42    0x42
0xbffff07c: 0x43    0x43    0x43    0x43    0x00    0xf3    0xff    0xbf
0xbffff084: 0x00    0xf0    0xff    0xb7    0xab    0x84

Here is the code:

#include <stdio.h>
#include <string.h>

void getName (char* name) {
    char myName[64];
    strcpy(myName, name);
    printf("Your name is: %s \n", myName);
}

int main (int argc, char* argv[]) {
    getName(argv[1]);
    return 0;
}

A disassembly of getName which shows that 88 bytes were added to the buffer:

Reading symbols from test...done.
(gdb) disas getName
Dump of assembler code for function getName:
   0x0804844d <+0>: push   %ebp
   0x0804844e <+1>: mov    %esp,%ebp
   0x08048450 <+3>: sub    $0x58,%esp
   0x08048453 <+6>: mov    0x8(%ebp),%eax
   0x08048456 <+9>: mov    %eax,0x4(%esp)
   0x0804845a <+13>:    lea    -0x48(%ebp),%eax
   0x0804845d <+16>:    mov    %eax,(%esp)
   0x08048460 <+19>:    call   0x8048320 <strcpy@plt>
   0x08048465 <+24>:    lea    -0x48(%ebp),%eax
   0x08048468 <+27>:    mov    %eax,0x4(%esp)
   0x0804846c <+31>:    movl   $0x8048530,(%esp)
   0x08048473 <+38>:    call   0x8048310 <printf@plt>
   0x08048478 <+43>:    leave  
   0x08048479 <+44>:    ret    
End of assembler dump.

Upvotes: 0

Views: 219

Answers (1)

Michael Petch
Michael Petch

Reputation: 47573

Unoptimized code may see extra padding on the stack because of inefficiencies, but most often padding is a result of the compiler trying to align data on the stack. GCC generally tries to allocate arrays on addresses evenly divisible by 16.

After EBP is pushed 0x58 bytes (88 bytes) are allocated. We can see that the buffer starts at EBP-0x48 because of this instruction:

lea    -0x48(%ebp),%eax

The address EBP-0x48 is then used to set the parameters on the stack for both the call to strcpy and printf. 0x48 = 72 bytes, despite the buffer being 64 bytes. There are an additional 8 bytes of padding. Why the padding there? Because the compiler has tried to ensure that the beginning of the myName buffer is on a 16 byte boundary.

GCC can keep track of what is on the stack, but an important piece of information about alignment is derived from the calling convention (64-bit System V ABI) that says upon a call to a function (in this case getName) the stack must be 16 byte aligned. The call instruction pushes 4 bytes for a return address and then EBP is pushed for an additional 4. The compiler knows after the PUSH EBP it is misaligned by 8 bytes. 64 + 8 bytes of padding + 4 for EBP + 4 return address = 80. 80 is evenly divisible by 16 (16*5=80). The use of 8 bytes wasn't arbitrary.

In the GDB output you can see the myName array starts on a hexadecimal address ending in 0. Any hexadecimal address that ends in 0 is evenly divisible by 16 and you can see the buffer starts at 0xbffff040:

0xbffff038: 0x50    0xf0    0xff    0xbf    0x54    0x82    0x04    0x08
0xbffff040: 0x41    0x41    0x41    0x41    0x41    0x41    0x41    0x41

With all that being said if you are looking to overwrite the return address it will be at an offset from the beginning of myName that is equal to 64 (array size) + 8 (padding) + 4(EBP on stack) = 76 bytes. You will have to write 76 bytes of data before reaching the point where you can replace the return address.

Note: You may wonder why the myname array has an additional 16 bytes beneath it on the stack (88-72=16 bytes). That space is where the compiler places values for the function calls like strcpy and printf and ensure that the function calls that are made have a stack 16 byte aligned to conform to the 64-bit System V ABI.


Reason for Unusual Data in Middle of myName

I confirmed the following observations by reproducing exactly what you saw on my own Ubuntu 14.04 system.

You were also wondering about the fact that when you inserted 72 A's and 4 B's that you had 4 unexpected bytes in the buffer:

0xbffff080:[0x14    0x84    0x04    0x08]   0x41    0x41    0x41    0x41
0xbffff088: 0x42    0x42    0x42    0x42    0x42    0x42    0x42    0x42

I've marked the 4 bytes with []. You are right that you might expect those 4 bytes to be 0x41 (The letter A) like the rest. What has happened is that although the input you gave on the command line was 76 characters (72+4) strcpy appended a NUL(\0) on the end as a 77th character. This overwrote the lower byte of the return address with 0! You used the c command to continue running after the breakpoint. The debugger terminated when it hit a segmentation fault. What happened was the RET instruction didn't return back to where you expected in main, it returned to a slightly lower location in memory because of the NUL byte being written into the return address. It just so happened that what you didn't see was all the instructions that executed after the RET that placed data back onto the stack. That included writing 32-bits of data into what was once your myName array.

When you wrote 72 A's, 4 B's, and 4 C's you ended up overwriting the return address with CCCC and you got a segmentation fault when the RET tried to start executing code at 0x43434343 as seen here:

0x43434343 in ?? ()

0x43434343 wasn't a valid address where you had execute permissions so it faulted. Because the RET failed to execute any more code the program didn't have a chance to overwrite the myName array. This explains why the buffer wasn't overwritten like the previous test.

Upvotes: 2

Related Questions