Ali Atıl
Ali Atıl

Reputation: 121

gcc subtracting from esp before call

I am planning to use C to write a small kernel and I really don't want it to bloat with unnecessary instructions.

I have two C files which are called main.c and hello.c. I compile and link them using the following GCC command:

gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o

I am dumping .text section using following OBJDUMP command:

objdump -w -j .text -D -mi386 -Maddr16,data16,intel main.o

and get the following dump:

00001000 <main>:
    1000:   67 66 8d 4c 24 04       lea    ecx,[esp+0x4]
    1006:   66 83 e4 f0             and    esp,0xfffffff0
    100a:   67 66 ff 71 fc          push   DWORD PTR [ecx-0x4]
    100f:   66 55                   push   ebp
    1011:   66 89 e5                mov    ebp,esp
    1014:   66 51                   push   ecx
    1016:   66 83 ec 04             sub    esp,0x4
    101a:   66 e8 10 00 00 00       call   1030 <hello>
    1020:   90                      nop
    1021:   66 83 c4 04             add    esp,0x4
    1025:   66 59                   pop    ecx
    1027:   66 5d                   pop    ebp
    1029:   67 66 8d 61 fc          lea    esp,[ecx-0x4]
    102e:   66 c3                   ret    
00001030 <hello>:
    1030:   66 55                   push   ebp
    1032:   66 89 e5                mov    ebp,esp
    1035:   90                      nop
    1036:   66 5d                   pop    ebp
    1038:   66 c3                   ret  

My questions are: Why are machine codes at the following lines being generated? I can see that subtraction and addition completes each other, but why are they generated? I don't have any variable to be allocated on stack. I'd appreciate a source to read about usage of ECX.

1016:   66 83 ec 04             sub    esp,0x4
1021:   66 83 c4 04             add    esp,0x4

main.c

extern void hello();


void main(){
    hello();
}

hello.c

void hello(){}

lscript.ld

SECTIONS{

    .text 0x1000 : {*(.text)}
}

Upvotes: 3

Views: 900

Answers (1)

Michael Petch
Michael Petch

Reputation: 47573

As I mentioned in my comments:

The first few lines (plus the push ecx) are to ensure the stack is aligned on a 16-byte boundary which is required by the Linux System V i386 ABI. The pop ecx and lea before the ret in main is to undo that alignment work.

@RossRidge has provided a link to another Stackoverflow answer that details this quite well.

In this case you seem to be doing real mode development. GCC isn't well suited for this but it can work and I will assume you know what you are doing. I mention some of the pitfalls of using -m16 in this Stackoverflow answer. I put this warning in that answer regarding real mode development with GCC:

There are so many pitfalls in doing this that I recommend against it.


If you remain undeterred and wish to continue forward you can do a few things to minimize the code. The 16-byte alignment of the stack at the point a function call is made is part of the more recent Linux System V i386 ABIs. Since you are generating code for a non-Linux environment you can change the stack alignment to 4 using compiler option -mpreferred-stack-boundary=2 . The GCC manual says:

-mpreferred-stack-boundary=num

Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).

If we add that to your GCC command we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2:

00001000 <main>:
    1000:       66 55                   push   ebp
    1002:       66 89 e5                mov    ebp,esp
    1005:       66 e8 04 00 00 00       call   100f <hello>
    100b:       66 5d                   pop    ebp
    100d:       66 c3                   ret

0000100f <hello>:
    100f:       66 55                   push   ebp
    1011:       66 89 e5                mov    ebp,esp
    1014:       66 5d                   pop    ebp
    1016:       66 c3                   ret

Now all the extra alignment code to get it on a 16-byte boundary has disappeared. We are left with typical function frame pointer prologue and epilogue code. This is often in the form of push ebp and mov ebp,esp pop ebp. we can remove these with the -fomit-frame-pointer define in the GCC manual as:

The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.

If we add that option we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2 -fomit-frame-pointer:

00001000 <main>:
    1000:       66 e8 02 00 00 00       call   1008 <hello>
    1006:       66 c3                   ret

00001008 <hello>:
    1008:       66 c3                   ret

You can then optimize for size with -Os. The GCC manual says this:

-Os

Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.

This has a side effect that main will be placed into a section called .text.startup. If we display both with objdump -w -j .text -j .text.startup -D -mi386 -Maddr16,data16,intel main.o we get:

Disassembly of section .text:

00001000 <hello>:
    1000:       66 c3                   ret

Disassembly of section .text.startup:

00001002 <main>:
    1002:       e9 fb ff                jmp    1000 <hello>

If you have functions in separate objects you can alter the calling convention so the first 3 Integer class parameters are passed in registers rather than the stack. The Linux kernel uses this method as well. Information on this can be found in the GCC documentation:

regparm (number)

On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.

I wrote a Stackoverflow answer with code that uses __attribute__((regparm(3))) that may be a useful source of further information.


Other Suggestions

I recommend you consider compiling each object individually rather than altogether. This is also advantageous since it can be more easily be done in a Makefile later on.

If we look at your command line with the extra options mentioned above you'd have:

gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o \
    -mpreferred-stack-boundary=2 -fomit-frame-pointer -Os

I recommend you do it this way:

gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
    -fomit-frame-pointer main.c -o main.o
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
    -fomit-frame-pointer hello.c -o hello.o

The -c option (I added it to the beginning) forces the compiler to just generate the object file from the source and not to perform linking. You will also notice the -T lscript.ld has been removed. We have created .o files above. We can now use GCC to link all of them together:

gcc -ffreestanding -nostdlib -Wl,--build-id=none -m16 \
    -Tlscript.ld main.o hello.o -o main.elf

The -ffreestanding will force the linker to not use the C runtime, the -Wl,--build-id=none will tell the compiler not to generate some noise in the executable for build notes. In order for this to really work you'll need a slightly more complex linker script that places the .text.startup before .text. This script also adds the .data section, the .rodata and .bss sections. The DISCARD option removes exception handling data and other unneeded information.

ENTRY(main)
SECTIONS{

    .text 0x1000 : SUBALIGN(4) {
        *(.text.startup);
        *(.text);
    }
    .data : SUBALIGN(4) {
        *(.data);
        *(.rodata);
    }
    .bss : SUBALIGN(4) {
        __bss_start = .;
        *(COMMON);
        *(.bss);
    }
    . = ALIGN(4);
    __bss_end = .;

    /DISCARD/ : {
        *(.eh_frame);
        *(.comment);
        *(.note.gnu.build-id);
    }
}

If we look at a complete OBJDUMP with objdump -w -D -mi386 -Maddr16,data16,intel main.elf we would see:

Disassembly of section .text:

00001000 <main>:
    1000:       e9 01 00                jmp    1004 <hello>
    1003:       90                      nop

00001004 <hello>:
    1004:       66 c3                   ret

If you want to convert main.elf to a binary file that you can place in a disk image and read it (ie. via BIOS interrupt 0x13), you can create it this way:

objcopy -O binary main.elf main.bin

If you dump main.bin with NDISASM using ndisasm -b16 -o 0x1000 main.bin you'd see:

00001000  E90100            jmp word 0x1004
00001003  90                nop
00001004  66C3              o32 ret

Cross Compiler

I can't stress this enough but you should consider using a GCC cross compiler. The OSDev Wiki has information on building one. It also has this to say about why:

Why do I need a Cross Compiler?

You need to use a cross-compiler unless you are developing on your own operating system. The compiler must know the correct target platform (CPU, operating system), otherwise you will run into trouble. If you use the compiler that comes with your system, then the compiler won't know it is compiling something else entirely. Some tutorials suggest using your system compiler and passing a lot of problematic options to the compiler. This will certainly give you a lot of problems in the future and the solution is build a cross-compiler.

Upvotes: 10

Related Questions