Yeezus
Yeezus

Reputation: 402

How is this string in an array represented in assembly when a C program is compiled using the gcc -S option?

This is a C program, which has been compiled to assembly using gcc -S. How is string "Hello, world" represented in this program?

This is the C-code:

1.        #include <stdio.h>
2.        
3.        int main(void) {
4.        
5.            char st[] = "Hello, wolrd";
6.            printf("%s\n", st);
7.
8.            return 0;
9.       }

Heres the assembly code:

1.        .intel_syntax noprefix
2.        .text
3.        .globl  main
4.
5. main:
6.         push    rbp
7.         mov     rbp, rsp
8.         sub     rsp, 32
9.         mov     rax, QWORD PTR fs:40
10         mov     QWORD PTR [rbp-8], rax
11.        xor     eax, eax
12.        movabs  rax, 8583909746840200520
15.        mov     QWORD PTR [rbp-32], rax
14.        mov     DWORD PTR [rbp-24], 1684828783
15.        mov     BYTE PTR [rbp-20], 0
16.        lea     rax, [rbp-32]
17.        mov     rdi, rax
18.        call    puts
19.        mov     eax, 0
20.        mov     rdx, QWORD PTR [rbp-8]
21.        xor     rdx, QWORD PTR fs:40
22        je      .L3
22.        call    __stack_chk_fail
23.  .L3:
24.        leave
25.        ret

Upvotes: 1

Views: 130

Answers (2)

chqrlie
chqrlie

Reputation: 144780

You are using a local buffer in function main, initialized from a string literal. The compiler compiles this initialization as setting the 16 bytes at [rbp-32] with 3 mov instructions. The first one via rax, the second immediate as the value is 32 bits, the third for a single byte.

8583909746840200520 in decimal is 0x77202c6f6c6c6548 in hex, corresponding to the bytes "Hello, W" in little endian order, 1684828783 is 0x646c726f, the bytes "orld". The third mov sets the final '\0' byte. Hence the buffer contains "Hello, World".

This string is then passed to puts for output to stdout.

Note that gcc optimized the call printf("%s\n", "Hello, World"); to puts("Hello, World");! By the way, clang performs the same optimization.

Upvotes: 6

Peter Cordes
Peter Cordes

Reputation: 364503

Interesting.

If you'd written const char *str="...", gcc would have passed puts a pointer to the string sitting there in the .rodata section, like in this godbolt link. (Well-spotted by chqrlie that gcc is optimizing printf to puts).

Your code forces the compiler to make a writeable copy of the string literal, by assigning it to a non-const char[]. (Actually, even with const char str[], gcc still generates it on the fly from mov-immediates. clang-3.7 spots the chance to optimize, though.)

Interestingly, it encodes it into immediate data, rather than copying into the buffer. If the array had been global, it would have just been sitting there in the regular .data section, not .rodata.


Also, in general avoid using main() to see compiler optimization. gcc on purpose marks it as "cold", and optimizes it less. This is an advantage for real programs that do their real work in other functions. No difference in this case, renaming main. But usually if you're looking at how gcc optimizes something, it's best to write a function that takes some args, and use those. Then you don't have to worry about gcc seeing that the inputs or loop-bounds are compile-time constants, either.

Upvotes: 2

Related Questions