Reputation: 1873

How does the linker decide where the code execution will start from? [Embedded]

As beginner in embedded C programming I am very curious how every (every in my experience) program execution starts with main() function? It is like the linker recognizes the main() and puts the address of that "special" function into address that the reset vector points to.

Upvotes: 1

Answers (6)

John Bollinger

Reputation: 180286

C defines different specifications for code that will run in a "hosted" environment and code that will run in a "freestanding" environment. Most programmers will go their whole careers without ever having to deal with a freestanding environment, but most of the exceptions are among those who work with embedded programming, kernel programming, boot loaders, and other software that runs on bare metal.

In a hosted environment, C specifies that program execution starts with a call to main(). That does not preclude preliminary setup performed by the system before that call, but that's outside the scope of the specification. The C compiler and / or linker is responsible for arranging for that to happen; details are implementation dependent.

In a freestanding implementation, on the other hand, the program entry point is determined in a manner chosen by the implementation. There might not be a main() function, and if there is one then its signature does not need to match those permitted to programs run in hosted environments.

Upvotes: 5

old_timer

Reputation: 71536

In order to meet the standard or at least expectations of programmers, before main you need bss cleared, compile time initialized variables (globals with an = something for example), a c library and other fun things. So you have this chicken and egg problem, how can you have C code with such assumptions or requirements and have C code that fills those requirements. you dont. There is other code, not uncommon to be assembly but could come from C where the assumptions are known to be not true. sometimes called bootstrap code. it doesnt matter if this is an embedded system or an application running on an operating system. there is some glue between the first instructions in that "program" to main. If you disassemble something gnu tools created you can see this execution path between a label named _start and main. other toolchains may or may not name their entry point differently.

in a microcontroller or situation where you might be bare metal (the bios on a pc, the startup code that launches the rtos/os) the bare minimum if you dont care about some of the requirements/assumptions of C, loading the stack pointer and branching to main is all you need. zeroing out bss and copying .data from flash to its proper home in ram, are the next two things you need to get closer to the C language requirements, and you will find those are all the steps you get in some embedded systems.

probably other processors too, but the arm cortex-m hardware has the ability to load the stack pointer and branch to an address (reset always branches to an address or runs code from some known address), further the interrupt system saves state for you so you dont need to wrap asm around interrupt service routines written in C (or do some compiler specific declaration which does the same thing)(this is the next question you would have needed ask anyway, 1) reset to C code 2) interrupts to C code), so the interrupt vector table can have addresses to C functions directly. A nice feature of that product line.

use the toolchains disassembler and examine the code from the entry point to main()...some toolchains certainly in the past, would make assuptions when it saw main() specifically and add extra code. so sometimes you see some other C function name used as the first C function to avoid the toolchain linking in other stuff.

Clifford hit the nail on the head though the linker is simply looking for unresolved symbols, one being main, with a gnu toolchain the other being _start. and it links in stuff it already knows about or you have provided on the command line until all the labels are resolved.

Upvotes: 0

Clifford

Reputation: 93476

The linker links a module for processor and runtime environment initialisation. That module is entered from the reset vector. In the gcc toolchain, the module is normally called crt0.o and is built from the source crt0.s (assembly code). Your toolchain may vary, but some sort of start-up code will be linked, and the source should be available for customisation.

The start-up code will typically perform hardware initialisation such as configuring the PLL for the desired clock speed, and initialising a memory controller if external memory is used. The C runtime initialisation requires the setting of the stack pointer, and the initialisation of global static data, and possibly runtime library initialisation - heap and stdio initialisation for example. For C++ it also invokes the constructors for any global static objects. Finally main() is called.

Note that it is not the linker specifically that knows about main(); that is simply an unresolved link in the runtime start-up module. If your program did not have a main(), it would fail to link.

You could of course modify the start-up code to use a different symbol other than main(), but main() is defined by the language standard as the entry point.

Some application frameworks or environments may appear to not have a main(); for example in the RTOS VxWorks, applications start at usrAppInit(), but in fact that is simply because main() is defined in the VxWorks library.

The linker locates the start-up code according to either directives in the assembly source, or within the linker script; toolchains may differ.

On ARM Cortex-M devices, the initial stack pointer is defined in the vector table and loaded automatically; as a consequence, it is possible for these devices to run C code directly from reset (albeit in a somewhat limited environment), and allows much of the runtime environment initialisation to be written in C rather than assembler.

Upvotes: 2

Russ Schultz

Reputation: 2689

Each processor and tool chain is different. Generally, though, they're set up where the entry point to the run time library (many times _start) is reached from the reset vector. The run time library prepares the processor state, clears .bss memory, initializes .data memory, maybe sets up the heap, and calls a few call outs to allow customization of the startup, then calls all global constructors (if c++), before finally jumping to main().

It's a mix of hardware requirements, tool chain assumptions, run time library, and system code. You can trim a lot of it out, because the only real requirement for C is that you have a stack. The rest is library code you may or may not use.

Upvotes: 1

Eugene Sh.

Reputation: 18331

It is not the linker, it's the processor who is deciding. On power-up the instruction pointer is set to a predefined memory address, usually the same as the reset interrupt vector. Then the linker kicks in by placing the branch instruction to the startup code at that address.

Upvotes: 2

PineForestRanch

Reputation: 473

Usually a linker script creates a special section which is mapped to the reset vector and includes a jump/goto instruction to the C startup code, which, in turn, calls the main().

Upvotes: 5

How does the linker decide where the code execution will start from? [Embedded]

Answers (6)

Related Questions