Reputation: 121
I am planning to use C to write a small kernel and I really don't want it to bloat with unnecessary instructions.
I have two C files which are called main.c
and hello.c
. I compile and link them using the following GCC command:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o
I am dumping .text section using following OBJDUMP command:
objdump -w -j .text -D -mi386 -Maddr16,data16,intel main.o
and get the following dump:
00001000 <main>:
1000: 67 66 8d 4c 24 04 lea ecx,[esp+0x4]
1006: 66 83 e4 f0 and esp,0xfffffff0
100a: 67 66 ff 71 fc push DWORD PTR [ecx-0x4]
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 51 push ecx
1016: 66 83 ec 04 sub esp,0x4
101a: 66 e8 10 00 00 00 call 1030 <hello>
1020: 90 nop
1021: 66 83 c4 04 add esp,0x4
1025: 66 59 pop ecx
1027: 66 5d pop ebp
1029: 67 66 8d 61 fc lea esp,[ecx-0x4]
102e: 66 c3 ret
00001030 <hello>:
1030: 66 55 push ebp
1032: 66 89 e5 mov ebp,esp
1035: 90 nop
1036: 66 5d pop ebp
1038: 66 c3 ret
My questions are: Why are machine codes at the following lines being generated? I can see that subtraction and addition completes each other, but why are they generated? I don't have any variable to be allocated on stack. I'd appreciate a source to read about usage of ECX.
1016: 66 83 ec 04 sub esp,0x4
1021: 66 83 c4 04 add esp,0x4
main.c
extern void hello();
void main(){
hello();
}
hello.c
void hello(){}
lscript.ld
SECTIONS{
.text 0x1000 : {*(.text)}
}
Upvotes: 3
Views: 900
Reputation: 47573
As I mentioned in my comments:
The first few lines (plus the push ecx) are to ensure the stack is aligned on a 16-byte boundary which is required by the Linux System V i386 ABI. The
pop ecx
andlea
before theret
in main is to undo that alignment work.
@RossRidge has provided a link to another Stackoverflow answer that details this quite well.
In this case you seem to be doing real mode development. GCC isn't well suited for this but it can work and I will assume you know what you are doing. I mention some of the pitfalls of using -m16
in this Stackoverflow answer. I put this warning in that answer regarding real mode development with GCC:
There are so many pitfalls in doing this that I recommend against it.
If you remain undeterred and wish to continue forward you can do a few things to minimize the code. The 16-byte alignment of the stack at the point a function call is made is part of the more recent Linux System V i386 ABIs
. Since you are generating code for a non-Linux environment you can change the stack alignment to 4 using compiler option -mpreferred-stack-boundary=2
. The GCC manual says:
-mpreferred-stack-boundary=num
Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits).
If we add that to your GCC command we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2
:
00001000 <main>:
1000: 66 55 push ebp
1002: 66 89 e5 mov ebp,esp
1005: 66 e8 04 00 00 00 call 100f <hello>
100b: 66 5d pop ebp
100d: 66 c3 ret
0000100f <hello>:
100f: 66 55 push ebp
1011: 66 89 e5 mov ebp,esp
1014: 66 5d pop ebp
1016: 66 c3 ret
Now all the extra alignment code to get it on a 16-byte boundary has disappeared. We are left with typical function frame pointer prologue and epilogue code. This is often in the form of push ebp
and mov ebp,esp
pop ebp
. we can remove these with the -fomit-frame-pointer
define in the GCC manual as:
The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.
If we add that option we get gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o -mpreferred-stack-boundary=2 -fomit-frame-pointer
:
00001000 <main>:
1000: 66 e8 02 00 00 00 call 1008 <hello>
1006: 66 c3 ret
00001008 <hello>:
1008: 66 c3 ret
You can then optimize for size with -Os
. The GCC manual says this:
-Os
Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
This has a side effect that main
will be placed into a section called .text.startup
. If we display both with objdump -w -j .text -j .text.startup -D -mi386 -Maddr16,data16,intel main.o
we get:
Disassembly of section .text:
00001000 <hello>:
1000: 66 c3 ret
Disassembly of section .text.startup:
00001002 <main>:
1002: e9 fb ff jmp 1000 <hello>
If you have functions in separate objects you can alter the calling convention so the first 3 Integer class parameters are passed in registers rather than the stack. The Linux kernel uses this method as well. Information on this can be found in the GCC documentation:
regparm (number)
On the Intel 386, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack.
I wrote a Stackoverflow answer with code that uses __attribute__((regparm(3))) that may be a useful source of further information.
I recommend you consider compiling each object individually rather than altogether. This is also advantageous since it can be more easily be done in a Makefile
later on.
If we look at your command line with the extra options mentioned above you'd have:
gcc -Wall -T lscript.ld -m16 -nostdlib main.c hello.c -o main.o \
-mpreferred-stack-boundary=2 -fomit-frame-pointer -Os
I recommend you do it this way:
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer main.c -o main.o
gcc -c -Os -Wall -m16 -ffreestanding -nostdlib -mpreferred-stack-boundary=2 \
-fomit-frame-pointer hello.c -o hello.o
The -c
option (I added it to the beginning) forces the compiler to just generate the object file from the source and not to perform linking. You will also notice the -T lscript.ld
has been removed. We have created .o
files above. We can now use GCC to link all of them together:
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m16 \
-Tlscript.ld main.o hello.o -o main.elf
The -ffreestanding
will force the linker to not use the C runtime, the -Wl,--build-id=none
will tell the compiler not to generate some noise in the executable for build notes. In order for this to really work you'll need a slightly more complex linker script that places the .text.startup
before .text
. This script also adds the .data
section, the .rodata
and .bss
sections. The DISCARD option removes exception handling data and other unneeded information.
ENTRY(main)
SECTIONS{
.text 0x1000 : SUBALIGN(4) {
*(.text.startup);
*(.text);
}
.data : SUBALIGN(4) {
*(.data);
*(.rodata);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
*(.note.gnu.build-id);
}
}
If we look at a complete OBJDUMP with objdump -w -D -mi386 -Maddr16,data16,intel main.elf
we would see:
Disassembly of section .text:
00001000 <main>:
1000: e9 01 00 jmp 1004 <hello>
1003: 90 nop
00001004 <hello>:
1004: 66 c3 ret
If you want to convert main.elf
to a binary file that you can place in a disk image and read it (ie. via BIOS interrupt 0x13), you can create it this way:
objcopy -O binary main.elf main.bin
If you dump main.bin
with NDISASM using ndisasm -b16 -o 0x1000 main.bin
you'd see:
00001000 E90100 jmp word 0x1004
00001003 90 nop
00001004 66C3 o32 ret
I can't stress this enough but you should consider using a GCC cross compiler. The OSDev Wiki has information on building one. It also has this to say about why:
Why do I need a Cross Compiler?
You need to use a cross-compiler unless you are developing on your own operating system. The compiler must know the correct target platform (CPU, operating system), otherwise you will run into trouble. If you use the compiler that comes with your system, then the compiler won't know it is compiling something else entirely. Some tutorials suggest using your system compiler and passing a lot of problematic options to the compiler. This will certainly give you a lot of problems in the future and the solution is build a cross-compiler.
Upvotes: 10