C language: how memory alignment happened in the stack for array

Question

all, I have a interesting question about memory alignment for array in C. My OS is 32 bit Ubuntu, I compile it with gcc -S -fno-stack-protector option.

Code:

char array1[5] = "aaaaa";
char array2[8];
array2[0] = 'b';

The assembly code:

pushl %ebp
move %esp, %ebp.         # esp and ebp are pointing to the same words
subl    $16, %esp        # move esp to lower 16
movl    $1633771873, -5(%ebp)       # input "aaaa"
movb    $97, -1(%ebp).              # input 'a'
movb    $98, -13(%ebp)              # input 'b'
movl    $0, %eax
leave

I have GDB to inspect the memory,

%ebp is efe8,

%esp is efd8,

&buf1 is efe3,

&buf2 is efdb.

In the GDB, I run x/4bd 0xbfffefd8, it shows

0xbfffefd8:    9  -124   4  98

if I run x/bd 0xbfffefd8, it shows

0xbfffefd8:    9

if I run x/bd 0xbfffefdb, it shows

0xbfffefd8:    98

So the memory looks like this

## high address ##
?                       efe8 <-- ebb
97  97  97  97          efe4 
0  -80  -5  97(a)       efe0
0    0   0   0          efdc
9 -124   4  98(b)       efd8 <-- esp
^            ^
|            |
efd8       efdb

Now my questions are:

why the character 'b'(98) is at efdb, while %esp is efd8? I think 'b' should also be at efd8, because it is the start of the 4-bytes word. Furthermore, if I keep filling more 'b' to buf2 which starts from efdb, it can only fill 5'b', not 8. How come? And what about the '\0'?

The same thing occurred to buf1, it starts from efe3, not efe0. What kind of alignment is this? It does not make sense to me.

From the assembly code, it doesn't show 16 alignment which I saw from other place, like this,

andl $-16, %esp     # this aligns esp to 16 boundary

When will the andl command show and when not? It is very common so I expect to see it in every program.

From the assembly code above, I could not see the memory alignment. Is it alway true? My understanding is that the assembly code is just interpreting high level code (very readable) to not-very-readable code, but still converts the exact message, so char[5] is not interpreted to the way considering memory alignment. Then the memory alignment should occur in the running time. Am I right? But GDB debug shows exactly the same as assembly code. No alignment at all.

Thanks.

vvaltchev · Accepted Answer

I see nothing wrong here. TLDR answer: char arrays are aligned to 1 byte, the compiler is right.

Digging a bit further. On my 64-bit machine, using GCC 7 with the -m32 option, I run and debugged the same code and I got the same results:

(gdb) x/4bd $esp+12
0xffffcdd4:     97      97      97      97
(gdb) x/4bd $esp+8 
0xffffcdd0:     0       -48     -7      97
(gdb) x/4bd $esp+4
0xffffcdcc:     0       0       0       0
(gdb) x/4bd $esp+0 
0xffffcdc8:     41      85      85      98

The addresses differ, of course and that's fine. Now, let me try to explain. First, the $esp, is aligned at 4 byte, as expected:

(gdb) p $esp
$9 = (void *) 0xffffcdc8

So far, so good. Now, because we know that char arrays use 1 by default as alignment, let's try to figure out what happened at compile-time. First, the compiler saw array1[5] and put it on the stack, but because it was 5 bytes wide it had extend it to a 2nd dword. So, the first dword is full of 'a' while just 1 byte of the 2nd dword was used. Now, array2[8] is placed immediately after (or before, depending on how you look things) array1[5]. It extends on 3 dwords, ending on the dword pointed by $esp.

So, we have:

[esp +  0] <3 bytes of garbage /* no var */>, 'b' /* array2 */,
[esp +  4] 0x0, 0x0, 0x0, 0x0, /* still array2 */
[esp +  8] <3 bytes of garbage /* still array2 */>, 'a' /* array1 */,
[esp + 12] 'a', 'a', 'a', 'a', /* still array1 */.

If you add a char[2] array after array2 you'll see it using the same dword pointed by $esp and still have 1 byte of garbage from $esp to your array3[2].

The compiler is absolutely allowed to do that. If you want your char arrays to be aligned at 4-bytes (but you need a good reason for that!), you have to use special compiler attributes like:

__attribute__ ((aligned(4)))

C language: how memory alignment happened in the stack for array

Answers (1)

Related Questions