How exactly does linking work?

Question

The way I understand the compilation process:

1) Preprocessing: All of your macros are replaced with their actual values, all comments are removed, etc. Replaces your #include statements with the literal text of the files you've included.

2) Compilation: Won't drill down too deep here, but the result is an assembly file for whatever architecture you are on.

3) Assembly: Takes the assembly file and converts it into binary instructions, i.e., machine code.

4) Linking: This is where I'm confused. At this point you have an executable. But if you actually run that executable what happens? Is the problem that you may have included *.h files, and those only contain function prototypes? So if you actually call one of the functions from those files, it won't have a definition and your program will crash?

If that's the case, what exactly does linking do, under the hood? How does it find the .c file associated with the .h that you included, and how does it inject that into your machine code? Doesn't it have to go through the whole compilation process again for that file?

Now, I've come to understand that there are two types of linking, dynamic and static. Is static when you actually recompile the source of the library for every executable you create? I don't quite understand how dynamic linking would work. So you compile one executable library that is shared by all of your processes that use it? How is that possible, exactly? Wouldn't it be outside of the address space of the processes trying to access it? Also, for dynamic linking, don't you still need to compile the library at some juncture in time? Is it just sitting there constantly in memory waiting to be used? When is it compiled?

Can you go through the above and clear up all of the misunderstandings, wrong assumptions there and substitute your correct explanation?

user529758 · Accepted Answer

At this point you have an executable.

No. At this point, you have object files, which are not, in themselves, executable.

But if you actually run that executable what happens?

Something like this:

h2co3-macbook:~ h2co3$ clang -Wall -o quirk.o quirk.c -c
h2co3-macbook:~ h2co3$ chmod +x quirk.o
h2co3-macbook:~ h2co3$ ./quirk.o
-bash: ./quirk.o: Malformed Mach-o file

I told you it was not an executable.

Is the problem that you may have included *.h files, and those only contain function prototypes?

Pretty close, actually. A translation unit (.c file) is (generally) transformed to assembly/machine code that represents what it does. If it calls a function, then there will be a reference to that function in the file, but no definition.

So if you actually call one of the functions from those files, it won't have a definition and your program will crash?

As I've stated, it won't even run. Let me repeat: an object file is not executable.

what exactly does linking do, under the hood? How does it find the .c file associated with the .h that you included [...]

It doesn't. It looks for other object files generated from .c files, and eventually libraries (which are essentially just collections of other object files).

And it finds them because you tell it what to look for. Assuming you have a project which consists of two .c files which call each other's functions, this won't work:

gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -o my_prog file1.o

It will fail with a linker error: the linker won't find the definition of the functions implemented in file2.c (and file2.o). But this will work:

gcc -c file1.c -o file1.o
gcc -c file2.c -o file2.o
gcc -o my_prog file1.o file2.o

[...] and how does it inject that into your machine code?

Object files contain stub references (usually in the form of function entry point addresses or explicit, human-readable names) to the functions they call. Then, the linker looks at each library and object file, finds the references (, throws an error if a function definition couldn't be found), then substitutes the stub references with actual "call this function" machine code instructions. (Yes, this is largely simplified, but without you asking about a specific architecture and a specific compiler/linker, it's hard to tell more precisely...)

Is static when you actually recompile the source of the library for every executable you create?

No. Static linkage means that the machine code of the object files of a library are actually copied/merged into your final executable. Dynamic linkage means that a library is loaded into memory once, then the aforementioned stub function references are resolved by the operating system when your executable is launched. No machine code from the library will be copied into your final executable. (So here, the linker in the toolchain only does part of the job.)

The following may help you to achieve enlightenment: if you statically link an executable, it will be self-contained. It will run anywhere (on a compatible architecture anyway). If you link it dynamically, it will only run on a machine if that particular machine has all the libraries installed that the program references.

So you compile one executable library that is shared by all of your processes that use it? How is that possible, exactly? Wouldn't it be outside of the address space of the processes trying to access it?

The dynamic linker/loader component of the OS takes care all of that.

Also, for dynamic linking, don't you still need to compile the library at some juncture in time?

As I've already mentioned: yes, it is already compiled. Then it is loaded at some point (typically when it's first used) into memory.

When is it compiled?

Some time before it could be used. Typically, a library is compiled, then installed to a location on your system so that the OS and the compiler/linker know about its existence, then you can start compiling (um, linking) programs that use that library. Not earlier.

How exactly does linking work?

Answers (1)

Related Questions