Reputation: 51
all, I have a interesting question about memory alignment for array in C. My OS is 32 bit Ubuntu, I compile it with gcc -S -fno-stack-protector option.
Code:
char array1[5] = "aaaaa";
char array2[8];
array2[0] = 'b';
The assembly code:
pushl %ebp
move %esp, %ebp. # esp and ebp are pointing to the same words
subl $16, %esp # move esp to lower 16
movl $1633771873, -5(%ebp) # input "aaaa"
movb $97, -1(%ebp). # input 'a'
movb $98, -13(%ebp) # input 'b'
movl $0, %eax
leave
I have GDB to inspect the memory,
%ebp
is efe8
,
%esp
is efd8
,
&buf1
is efe3
,
&buf2
is efdb
.
In the GDB, I run x/4bd 0xbfffefd8
, it shows
0xbfffefd8: 9 -124 4 98
if I run x/bd 0xbfffefd8, it shows
0xbfffefd8: 9
if I run x/bd 0xbfffefdb, it shows
0xbfffefd8: 98
So the memory looks like this
## high address ##
? efe8 <-- ebb
97 97 97 97 efe4
0 -80 -5 97(a) efe0
0 0 0 0 efdc
9 -124 4 98(b) efd8 <-- esp
^ ^
| |
efd8 efdb
Now my questions are:
efdb
, while %esp
is efd8
? I think 'b' should also be at efd8
, because it is the start of the 4-bytes word. Furthermore, if I keep filling more 'b' to buf2
which starts from efdb
, it can only fill 5'b', not 8. How come? And what about the '\0'? The same thing occurred to buf1
, it starts from efe3
, not efe0
. What kind of alignment is this? It does not make sense to me.
andl $-16, %esp # this aligns esp to 16 boundary
When will the andl command show and when not? It is very common so I expect to see it in every program.
From the assembly code above, I could not see the memory alignment. Is it alway true? My understanding is that the assembly code is just interpreting high level code (very readable) to not-very-readable code, but still converts the exact message, so char[5]
is not interpreted to the way considering memory alignment. Then the memory alignment should occur in the running time. Am I right? But GDB debug shows exactly the same as assembly code. No alignment at all.
Thanks.
Upvotes: 2
Views: 1127
Reputation: 628
I see nothing wrong here. TLDR answer: char arrays are aligned to 1 byte, the compiler is right.
Digging a bit further. On my 64-bit machine, using GCC 7 with the -m32 option, I run and debugged the same code and I got the same results:
(gdb) x/4bd $esp+12
0xffffcdd4: 97 97 97 97
(gdb) x/4bd $esp+8
0xffffcdd0: 0 -48 -7 97
(gdb) x/4bd $esp+4
0xffffcdcc: 0 0 0 0
(gdb) x/4bd $esp+0
0xffffcdc8: 41 85 85 98
The addresses differ, of course and that's fine. Now, let me try to explain.
First, the $esp
, is aligned at 4 byte, as expected:
(gdb) p $esp
$9 = (void *) 0xffffcdc8
So far, so good. Now, because we know that char arrays use 1 by default as alignment, let's try to figure out what happened at compile-time. First, the compiler saw array1[5]
and put it on the stack, but because it was 5 bytes wide it had extend it to a 2nd dword. So, the first dword is full of 'a' while just 1 byte of the 2nd dword was used. Now, array2[8]
is placed immediately after (or before, depending on how you look things) array1[5]
. It extends on 3 dwords, ending on the dword pointed by $esp
.
So, we have:
[esp + 0] <3 bytes of garbage /* no var */>, 'b' /* array2 */,
[esp + 4] 0x0, 0x0, 0x0, 0x0, /* still array2 */
[esp + 8] <3 bytes of garbage /* still array2 */>, 'a' /* array1 */,
[esp + 12] 'a', 'a', 'a', 'a', /* still array1 */.
If you add a char[2]
array after array2
you'll see it using the same dword pointed by $esp
and still have 1 byte of garbage from $esp
to your array3[2]
.
The compiler is absolutely allowed to do that. If you want your char
arrays to be aligned at 4-bytes (but you need a good reason for that!), you have to use special compiler attributes like:
__attribute__ ((aligned(4)))
Upvotes: 2