ratsafalig
ratsafalig

Reputation: 502

How dynamic-linking know where to find the linked files?

I read from some book which says this fact about dynamic-linking:

For example when executing this command:

gcc main.o lib.so.

The main.o doesn't copy any infomation of lib.so.

Instead, the ld-linux.so copy infomations about lib.so into ld-linux.so.

main.o only try to linking files when execute main.out, then it turn to ask ld-linux.so what files to link.


My question is simple: How exactly ld-linux.so copy the infomations about lib.so?

It can't just simply copy the infomation says: alright, the main.out is linking to lib.so, can it?

If it does, then the ld-linux.so itself will soon become enormous.

So there must be some misunderstood about that.

Upvotes: 1

Views: 3825

Answers (2)

Glärbo
Glärbo

Reputation: 296

It is well described in the man 8 ld-linux.so man page (links to proper upstream, Linux man-pages project).

In short, simplified a bit (ignoring preloaded libraries, ELF DT_RPATH/DT_RUNPATH, and various options in the binary itself that needs those dynamic libraries):

ld-linux.so looks the library up in the directories specified in LD_LIBRARY_PATH environment variable if defined.

If not defined, or not found there, ld-linux.so checks the /etc/ld.so.cache file: a binary cache, updated by ldconfig administration command (that is automatically run by your package manager whenever necessary), containing the paths to (most) known dynamic libraries.

If not found there, ld-linux.so checks if the library is found in the standard library directories.


Linux uses the ELF file format for binaries and dynamic libraries. This is a very structured format.

Whenever you execute a new ELF binary, at the very end, in Linux it boils down to an execve or execveat syscall (or exec_with_loader syscall on some architectures).

The Linux kernel opens the file, checking for proper permissions, and maps relevant parts of the ELF file into memory. (There is a module, binfmt_misc, for extending the types of files the kernel will execute. In addition to ELF files, the kernel recognizes #! at the very beginning of a file to indicate a script, followed by the path to the script interpreter that will be executed instead.)

If the ELF file was statically linked, the kernel lets the userspace continue execution at the ELF file start point. (Note that this is not the standard C library main(); the standard C library actually links in proper initialization and exit code.)

If the ELF file was dynamically linked, it has a DT_INTERP program header specifying the absolute path to the dynamic linker. (Note that there can be several; typically one for 64-bit binaries, and one for 32-bit binaries.) The kernel will map that into memory, and hand off execution to it instead.

The dynamic linker will stay in memory for the lifetime of the process. It provides useful features exposed by including <dlfcn.h> (see man 3 dl_iterate_phdr and man 3 dlsym in particular). For example, you can dynamically load and unload new ELF libraries at any time. This is commonly used for plugins and plugin-type functionality.

Not only does the dynamic linker find and map in memory all dynamically loaded libraries, and handle their relocation records and symbol tables, it also does some very useful things before handing execution off to the starting point of the original binary. For example, both the Linux dynamic linker and static linkers provide a way to execute functions after all dynamic libraries have been loaded, but before main() is executed (by simply marking the functions __attribute__((constructor)); and similarly for executing functions after main() returns or exit() is called (but not if the process dies due to a signal, or uses _exit()/_Exit(), by marking those functions __attribute__((destructor)).

Note that I above say "map" instead of "load". This is because the Linux kernel memory-maps the data from the storage to memory, instead of "loading" it in the traditional sense. Because of the page cache, this also means that no matter how many copies of a specific program or library you have running, only one copy actually resides in RAM (unless you do certain odd shenanigans, that is).

Finally, the Linux dynamic linker is actually a part of the C library, not the Linux kernel. For further details, go read the glibc runtime dynamic linker sources.

Upvotes: 4

Instead, the ld-linux.so copy infomations about lib.so into ld-linux.so.

This is wrong, at least on Debian/Buster or Ubuntu 20.04 for x86-64.

Since the file /usr/lib/ld-linux.so.2 is owned by root and is not writable by a random user (the one running gcc main.o lib.so). See credentials(7) and environ(7).

Notice that both GNU binutils and GCC (and also the Linux kernel) are free software - you are allowed to download their source code and improve it. I recommend to study their source code.

Or at least use strace(1) or ltrace(1) or gdb(1) or pmap(1) (see also proc(5)) to understand what your gcc process is doing and what syscalls(2) are involved. Use also ldd(1) and readelf(1) and objdump(1) on your executable. See also elf(5).

See also this draft report and the CHARIOT & DECODER projects and the Linux From Scratch website and Advanced Linux Programming, Drepper's paper How to write shared libraries, and the Linux Program Library HowTO.

Upvotes: 3

Related Questions