Reputation: 135
I made two programs to output two strings, one in assembly and the other one in C. This is the program in assembly:
.section .data
string1:
.ascii "Hola\0"
string2:
.ascii "Adios\0"
.section .text
.globl _start
_start:
pushl $string1
call puts
addl $4, %esp
pushl $string2
call puts
addl $4, %esp
movl $1, %eax
movl $0, %ebx
int $0x80
I build the program with
as test.s -o test.o
ld -dynamic-linker /lib/ld-linux.so.2 -o test test.o -lc
And the output is as expected
Hola
Adios
This is the C program:
#include <stdio.h>
int main(void)
{
puts("Hola");
puts("Adios");
return 0;
}
And I get the expected output, but when converting this C program to assembly with gcc -S (OS is Debian 32 bit) the output assembly source code does not include the null character in both strings, as you can see here:
.file "testc.c"
.section .rodata
.LC0:
.string "Hola"
.LC1:
.string "Adios"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x7c,0x6
subl $4, %esp
subl $12, %esp
pushl $.LC0
call puts
addl $16, %esp
subl $12, %esp
pushl $.LC1
call puts
addl $16, %esp
movl $0, %eax
movl -4(%ebp), %ecx
.cfi_def_cfa 1, 0
leave
.cfi_restore 5
leal -4(%ecx), %esp
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 4.9.2-10) 4.9.2"
.section .note.GNU-stack,"",@progbits
My two questions are:
1) Why the gcc generated assembly code does not append the null character at the end of both strings? I thought that C did this automatically.
2) If I skip the null characters in my hand made assembly code i get this output:
HolaAdios
Adios
I understand why I get the "HolaAdios" part at the first line, but why does the program end successfully after the "Adios" part if it is not null-terminated?
Upvotes: 8
Views: 1862
Reputation: 364512
Just to add a bit more detail:
Your second string is zero-terminated by chance, because there's nothing after it in your .data
section. You dynamically link glibc, which also has a .data
section which gets mapped into your process's address space. It's a private mapping, but I think it is mapped, not copied, so it's page-aligned. The rest of the page holding your executable's data segment is padded with zeros. (The ABI may not guarantee this, but Linux has to do something to avoid leaking kernel data).
When your executable is loaded into memory, the data segment is loaded separately from the text segment. See this answer about the difference between sections (which the linker cares about) and executable segments (which the program loader cares about).
Note that gcc puts string constants in the .rodata
section, which the linker places in the text segment of the executable, along with the .text
section: read-only so it can be shared between multiple processes running the same executable. Sections are aligned by default with padding, so even if you put your strings in .rodata
without zero terminators, there would be a zero of padding after the 2nd.
This wouldn't happen if it happened to end at the right alignment boundary (e.g. length was a multiple of 16, or something).
BTW, you can confirm that there weren't any non-printing garbage characters after the string, using strace ./string-test
. You can see: write(1, "Adios\n", 6) = 6
.string
is a synonym for .asciz
. The manual uses different language to describe the fact that they process backslash escape sequences, and append a zero-byte, but they do the same thing. The GNU assembler has a lot of synonyms for compatibility with many different Unix vendor-supplied assemblers, so it can be confusing to realize there's actually no difference when gcc uses .zero but clang uses .skip, or something like that.
I build the program with...
The commands you used will only work on a 32-bit system. On a 64-bit host, you'd build a 64-bit binary which still uses the 32-bit system call ABI. (And the 32-bit dynamic linker path, so it wouldn't even work by accident, even though static data addresses are in the low 32 bits, so could be passed to the 32-bit wrapper for sys_write.)
Also, I'd recommend calling your source file test.S
. capital-S is the usual for hand-written asm source. You can assemble and link with gcc -m32 -nostartfiles test.S -o test
to assemble and link the same way as you were doing manually.
See this Q&A for the full details on building asm on Linux: Assembling 32-bit binaries on a 64-bit system (GNU toolchain)
See also the x86 tag wiki for lots of interesting links.
Upvotes: 0
Reputation: 131
.string
always appends a null terminator, as seen here.puts
just continues until it sees a null byte. \x00
s are very common, there must be one nearby so it works (probably due to section alignment of .rodata
).Upvotes: 6