carl.hiass
carl.hiass

Reputation: 1774

Return values in main vs _start

Note, this question already has similar answers here, which I want to point out to:

However this question is asking more about the return formats of them and how they relate to each other (which I don't think is entirely covered in the above questions).


What are the differences between _start and main ? It seems to me like ld uses _start, but that gcc uses main as the entry point. The other difference that I've noticed is that main seems to return the value in %rax, whereas _start returns the value in %rbx

The following is an example of the two ways I'm seeing this:

.globl _start
_start:
    mov $1, %rax
    mov $2, %rbx
    int $0x80

And to run it:

$ as script.s -o script.o; ld script.o -o script; ./script; echo $?
# 2

And the other way:

.globl main
main:
    mov $3, %rax
    ret

And to run it:

$ gcc script.s -o script; ./script; echo $?
3

What is the difference between these two methods? Does main automatically invoke _start somewhere, or how do they relate to each other? Why does one return their value in rbx whereas the other one returns it in rax ?

Upvotes: 1

Views: 2571

Answers (2)

old_timer
old_timer

Reputation: 71546

_start is the entry point for the binary. Main is the entry point for the C code.

_start is specific to a toolchain, main() is specific to a language.

You can't simply start executing compiled C code, you need a bootstrap, some code that preps the minimum things that a high level language like that requires, other languages have a longer list of requirements but for C you need to either through the loader if on an operation system or the bootstrap or both a solution for the stack pointer so that there is a stack, the read/write global data (often called .data) is initialized and the zeroed (often called .bss) data is zeroed. Then the bootstrap can call main().

Because most code runs on some operating system, and the operating system can/does load that code into ram it doesn't need a hard entry point requirement as you would need for booting a processor for example where there is a hard entry point or there is a hard vector table address. So gnu is flexible enough and some operating systems are flexible enough that the entry point of the code doesn't have to be the first machine code in the binary. Now that doesn't mean that _start indicates the entry point per se as you need to tell the linker the entry point ENTRY(_start) for example if you use a linker script for gnu ld. But the tools do expect a label to be found called _start, and if the linker doesn't then it issues a warning, it keeps going but issues a warning.

main() is specific to the C language as the C entry point, the label the bootstrap calls after it does its job and is ready to run the compiled C code.

If loading into ram and if the binary file format supports it and the operating system's loader supports it the entry point into the binary can be anywhere in the binary, indicated in the binary file.

You can kind of think of _start as the entry point into the binary and main as the entry point into the compiled C code.

The return for a C function is defined by the calling convention that C compiler uses, which the compiler authors are free to do whatever they want, but modern times they often conform to a target defined (ARM, x86, MIPS, etc) defined convention. So that C calling convention defines exactly how to return something depending on the thing, so int main () is a return of an int but float myfun() might have a different rule within the convention.

The return from a binary if you can even return, is defined by the operating system or operating environment which is independent of the high level language. So on a mac on an x86 processor the rule may be one thing on Windows on an x86 the rule may be another, on Ubuntu Linux on the same x86 may be another, bsd, another, probably not but Mint Linux another, and so on.

The rules and system calls are specific to the operating system not the processor or computer or certainly not the high level language that does not directly touch the operating system anyway (handled in bootstrap or library code not in high level language code). A number of them you are supposed to make a system call not simply return a value in a register, but clearly the operating system needs to be robust enough to handle an improper return, for malformed binaries. And/or allow that as a legal return without an exiting system call, and in that case would then define a rule for how to return without a system call.

As far as main calling _start you can easily see this yourself:

int main ( void )
{
    return(5);
}

readelf shows:

  Entry point address:               0x500

objdump shows (not the whole output here)

Disassembly of section .init:

00000000000004b8 <_init>:
 4b8:   48 83 ec 08             sub    $0x8,%rsp
 4bc:   48 8b 05 25 0b 20 00    mov    0x200b25(%rip),%rax        # 200fe8 <__gmon_start__>
 4c3:   48 85 c0                test   %rax,%rax
 4c6:   74 02                   je     4ca <_init+0x12>
 4c8:   ff d0                   callq  *%rax
 4ca:   48 83 c4 08             add    $0x8,%rsp
 4ce:   c3                      retq   

...

Disassembly of section .text:

00000000000004f0 <main>:
 4f0:   b8 05 00 00 00          mov    $0x5,%eax
 4f5:   c3                      retq   
 4f6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 4fd:   00 00 00 

...

0000000000000500 <_start>:
 500:   31 ed                   xor    %ebp,%ebp
 502:   49 89 d1                mov    %rdx,%r9
 505:   5e                      pop    %rsi
 506:   48 89 e2                mov    %rsp,%rdx
 509:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
 50d:   50                      push   %rax
 50e:   54                      push   %rsp
 50f:   4c 8d 05 6a 01 00 00    lea    0x16a(%rip),%r8        # 680 <__libc_csu_fini>
 516:   48 8d 0d f3 00 00 00    lea    0xf3(%rip),%rcx        # 610 <__libc_csu_init>
 51d:   48 8d 3d cc ff ff ff    lea    -0x34(%rip),%rdi        # 4f0 <main>
 524:   ff 15 b6 0a 20 00       callq  *0x200ab6(%rip)        # 200fe0 <__libc_start_main@GLIBC_2.2.5>
 52a:   f4                      hlt    
 52b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

So you can see everything I mentioned above. The entry point for the binary is not at the beginning of the binary. The entry point (for the binary) is _start, somewhere in the middle of the binary. And somewhere after _start (not necessarily as close as seen here, could be buried under other nested calls) main is called from the bootstrap code. It is assumed that .data and .bss and the stack are setup by the loader not by the bootstrap before calling the C entry point.

So in this case which is typical _start is the entry point for the binary, somewhere after it bootstraps for C it calls the C entry point main(). As the programmer though you control which linker script and bootstrap are used and as a result don't have to use _start as the entry point you can create your own (certainly can't be main() though, unless you are not fully supporting C and possibly other exceptions related to the operating system).

Upvotes: 2

Peter Cordes
Peter Cordes

Reputation: 364248

TL:DR: function return values and system-call arguments use separate registers because they're completely unrelated.


When you compile with gcc, it links CRT startup code that defines a _start. That _start (indirectly) calls main, and passes main's return value (which main leaves in EAX) to the exit() library function. (Which eventually makes an exit system call, after doing any necessary libc cleanup like flushing stdio buffers.)

See also Return vs Exit from main function in C - this is exactly analogous to what you're doing, except you're using _exit() which bypasses libc cleanup, instead of exit(). Syscall implementation of exit()

An int $0x80 system call takes its argument in EBX, as per the 32-bit system-call ABI (which you shouldn't be using in 64-bit code). It's not a return value from a function, it's the process exit status. See Hello, world in assembly language with Linux system calls? for more about system calls.

Note that _start is not a function; it can't return in that sense because there's no return address on the stack. You're taking a casual description like "return to the OS" and conflating that with a function's "return value". You can call exit from main if you want, but you can't ret from _start.

EAX is the return-value register for int-sized values in the function-calling convention. (The high 32 bits of RAX are ignored because main returns int. But also, $? exit status can only get the low 8 bits of the value passed to exit().)

Related:

Upvotes: 11

Related Questions