Reputation: 671
How does a C program get started?
Upvotes: 67
Views: 22151
Reputation: 54325
The operating system calls the main()
function. Eventually.
The Executable and Linkable Format (ELF) which many Unix OS's use defines an entry point address and an INIT address. That is where the program begins to run after the OS finishes its exec()
call. On a Linux system this is _init
in the .init
section. After that returns it jumps to the entry point address which is _start
in the .text
section.
The C compiler links a standard library to every application which provides these operating system defined initialization and entry points. That library then calls main()
.
Here is my C source code for the example:
#include <stdio.h>
int main() {
puts("Hello world!");
return 0;
}
From objdump -d
:
Disassembly of section .init:
0000000000001000 <_init>:
1000: f3 0f 1e fa endbr64
1004: 48 83 ec 08 sub $0x8,%rsp
1008: 48 8b 05 d9 2f 00 00 mov 0x2fd9(%rip),%rax # 3fe8 <__gmon_start__>
100f: 48 85 c0 test %rax,%rax
1012: 74 02 je 1016 <_init+0x16>
1014: ff d0 callq *%rax
1016: 48 83 c4 08 add $0x8,%rsp
101a: c3 retq
Disassembly of section .text:
0000000000001060 <_start>:
1060: f3 0f 1e fa endbr64
1064: 31 ed xor %ebp,%ebp
1066: 49 89 d1 mov %rdx,%r9
1069: 5e pop %rsi
106a: 48 89 e2 mov %rsp,%rdx
106d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
1071: 50 push %rax
1072: 54 push %rsp
1073: 4c 8d 05 66 01 00 00 lea 0x166(%rip),%r8 # 11e0 <__libc_csu_fini>
107a: 48 8d 0d ef 00 00 00 lea 0xef(%rip),%rcx # 1170 <__libc_csu_init>
1081: 48 8d 3d c1 00 00 00 lea 0xc1(%rip),%rdi # 1149 <main>
1088: ff 15 52 2f 00 00 callq *0x2f52(%rip) # 3fe0 <__libc_start_main@GLIBC_2.2.5>
108e: f4 hlt
108f: 90 nop
0000000000001140 <frame_dummy>:
1140: f3 0f 1e fa endbr64
1144: e9 77 ff ff ff jmpq 10c0 <register_tm_clones>
From readelf -h
you can see the Entry point address that matches _start
:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1060
Start of program headers: 64 (bytes into file)
Start of section headers: 17416 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 36
Section header string table index: 35
From readelf -d
:
Dynamic section at offset 0x2dc8 contains 27 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x1000
0x000000000000000d (FINI) 0x11e8
0x0000000000000019 (INIT_ARRAY) 0x3db8
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x3dc0
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3a0
0x0000000000000005 (STRTAB) 0x470
0x0000000000000006 (SYMTAB) 0x3c8
0x000000000000000a (STRSZ) 130 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x3fb8
0x0000000000000002 (PLTRELSZ) 24 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x5e0
0x0000000000000007 (RELA) 0x520
0x0000000000000008 (RELASZ) 192 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000000000001e (FLAGS) BIND_NOW
0x000000006ffffffb (FLAGS_1) Flags: NOW PIE
0x000000006ffffffe (VERNEED) 0x500
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x4f2
0x000000006ffffff9 (RELACOUNT) 3
0x0000000000000000 (NULL) 0x0
You can see that INIT is equal to the address of _init
.
There is a whole array of function pointers in INIT_ARRAY also. See objdump -s -j .init_array c-test
:
c-test: file format elf64-x86-64
Contents of section .init_array:
3db8 40110000 00000000 @.......
You can see that address 0x3db8 is the same as INIT_ARRAY in the ELF header.
The address 0x1140 (remember little-endian byte layout from 40110000) is the function frame_dummy
you can see in the disassembly. Which then calls register_tm_clones
and who knows what else.
The code for initialization is in a set of files named crtbegin.o and crtend.o (and variants of those names). The __libc_start_main
function is defined in libc.so.6. These libraries are part of GCC. That code does various things necessary for a C program like setting up stdin, stdout, global and static variables and other things.
The following article describes quite well what it does in Linux (taken from an answer below with less votes): http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html
I believe someone else's answer already described what Windows does.
Upvotes: 56
Reputation: 60065
Eventually it is operating system. Usually there is some medium between real entry point and main function, this is inserted by compiler linker.
Some details (related to Windows): There is header in PE file called IMAGE_OPTIONAL_HEADER
which has the field AddressOfEntryPoint
, which is in turn address of the first code byte in the file that will be executed.
Upvotes: 26
Reputation: 560
Probably the best information for your question can be found in the below mentioned link http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html, the best one I have come across till date.
Upvotes: 5
Reputation: 4186
The operating system calls a function included in the C runtime (CRT) and linked into your executable. Call this "CRT main."
CRT main does a few things, the two most important of which, at least in C++, are to run through an array of global C++ classes and call their constructors, and to call your main() function and give its return value to the shell.
The Visual C++ CRT main does a few more things, if memory serves. It configures the memory allocator, important if using the Debug CRT to help find memory leaks or bad accesses. It also calls main within a structured exception handler that catches bad memory access and other crashes and displays them.
Upvotes: 5
Reputation: 40309
The operating system calls main. There will be an address in the relocatable executable that points at the location of main (See the Unix ABI for more information).
But, who calls the operating system?
The central processing unit, on the "RESET" signal, (which is also asserted at power on), will begin looking in some ROM at a given address (say, 0xffff) for its instructions.
Typically there will be some sort of jump instruction out to the BIOS, which gets the memory chips configured, the basic hard drive drivers loaded, etc, etc. Then the Boot Sector of the hard drive is read, and the next bootloader is started, which loads the file containing the basic information of how to read, say, an NTFS partition and how to read the kernel file itself. The kernel environment will be set up, the kernel loaded, and then - and then! - the kernel will be jumped to for execution.
After all that hard work has been done, the kernel can then proceed to load our software.
Upvotes: 11
Reputation: 25834
Note that in addition to the answers already posted, it is also possible for you to call main
yourself. Generally this is a bad idea reserved for obfuscated code.
Upvotes: 4
Reputation: 405
http://coding.derkeiler.com/Archive/C_CPP/comp.lang.c/2008-04/msg04617.html
Upvotes: 9