padawanTony
padawanTony

Reputation: 1359

Linking files/headers in C

Let's say I have the following program (hello.c):

#include <stdio.h>
#include <math.h>

#define NAME "ashoka"


int main(int argc, char *argv[])
{
    printf("Hello, world! My name is %s\n", NAME);
}

So, as I understand it the process of compiling this program is:

  1. Preprocessing: will copy-paste the stdio.h and math.h functions declarations and replace NAME with "ashoka".

    clang -E hello.c
    
  2. Compiling: will turn code into assembly code

    clang -S hello.c
    

    file produced: hello.s

  3. Assembling: transform assembly code to object code

    clang -c hello.s
    

    file produced: hello.o

  4. Linking: combine object files into one file that we will execute.

    clang hello.o -lm
    

    OR (let's say I also want to link hello2.o)

    clang hello.o hello2.o
    

So, here come the questions:

  1. Is the process described the correct one?

  2. In the linking stage, we link together .o (Object code) files. I know that math.h resides in /usr/include directory. Where is math.o? How does the linker find it?

  3. What are .a (static libraries) and .so (dynamic libraries) in Linux? And how are they related with .o files and the linking stage?

  4. Let's say I want to share a library I made with the world. I have a mylib.c file, in which I have declared and implemented my functions. How would I go about sharing this so that people would include it in their projects by doing either #include <mylib.h> or #include "mylib.h"?

Upvotes: 11

Views: 1499

Answers (3)

Andrea Biondo
Andrea Biondo

Reputation: 1686

  1. Yes, though going through assembly is an extra step (you can just compile the C source to an object). Internally, the compiler will have many more stages: parsing code into an AST, generating intermediate code (e.g. LLVM bitcode for clang), optimizing, etc.
  2. math.h just defines protypes for the standard math library libm.a (which you link with -lm). The functions themselves live in object files archived inside libm.a (see below).
  3. Static libraries are just archives of object files. The linker will check what symbols are used and will extract and link the object files that export those symbols. Those libraries can be manipulated with ar (for example ar -t lists the object files in a library). Dynamic (or shared) libraries are not included in the output binary. Instead, the symbols your code needs are loaded at runtime.
  4. You would simply create an header file with your externed prototypes:

    #ifndef MYLIB_H
    #define MYLIB_H
    
    extern int mylib_something(char *foo, int baz);
    
    #endif
    

    and ship it with your library. Of course the developer must also link (dinamically) against your library.

The advantage of static libraries is reliability: there will be no surprises, because you already linked your code against the exact version you're sure it works with. Other cases where it may be useful is when you're using uncommon or bleeding-edge libraries and you don't want to install them as shared. This comes at the cost of increased binary size.

Shared libraries produce smaller binaries (because the library is not in the binary) with smaller RAM footprint (because the OS can load the library once and share it among many processes), but they require a bit more care to make sure you're loading exactly what you want (e.g. see DLL Hell on Windows).

As @iharob notes, their advantages don't just stop at binary size. For example, if a bug is fixed in a shared library all programs will benefit from it (as long as it doesn't break compatibility). Also, shared libraries provide abstraction between the external interface and the implementation. For example, say an OS provides a library for applications to interface to it. With updates, the OS interface changes, and the library implementation tracks those changes. If it was compiled as a static library, all programs would have to be recompiled with the new version. If it was a shared library, they wouldn't even notice it (as long as the external interface stays the same). Another example are Linux libraries that wrap system/distro-specific aspects to a common interface.

Upvotes: 5

dbush
dbush

Reputation: 223739

The process you describe above is correct. In the vast majority of cases however, the C code is preprocessed and assembled in a single step as follows:

clang -c hello.c

Performing separate preprocessing is typically only done for debugging. Conversion to assembly is almost never done unless you intend to do some manual assembly level optimization, which is rarely necessary.

With regard to linking, the -l option tells the linker to look for a shared library of the form "lib{name}.so". In your example, -lm tells the linker to link with libm.so. By default it will look in /usr/lib, however you can use the -L option to give it a list of directories to search for libraries.

You use the -B flag to switch between linking with static libraries or dynamic libraries:

clang hello.o -lm -Bstatic -lstaticlib -B dynamic -ldynamiclib

This will link with libm.so, libstaticlib.a, and libdynamiclib.so

Static libraries are linked directly to your executable like .o files are. In contrast, dynamic libraries are kept separate from your executable and are loaded up at run time.

Upvotes: 4

Iharob Al Asimi
Iharob Al Asimi

Reputation: 53006

  1. Yes, this is the process generally.
  2. There is no math.o file, the -lm switch links to libm.so (a shared object, hence: .so) where all the symbols required by math functions declared in math.h are defined.
  3. Lets answer this in two sections

    Static libraries

       Are simply collections of object files saved in an archive format.

    Shared libraries

       Are (on linux) ELF files with symbols defined like they are defined in executable files, you link programs to be able to use these symbols at runtime and there is a loader that loads such symbols into the program to be used.

       This is pretty much the same on other platforms, like .dlls on windows, they are basically compiled programs that lack a main() function so they cannot be executed directly. Thye contain executable code to be loaded at runtime. You can do it yourself in fact by using dlopen(3) on linux.


Note: in the code you posted some things will not happen, because you didn't use anything from math.h so linking to libm.so is completely uneeded. Compilers also try to optimize the generated code and in your case the program is equivalent to the simplest Hello World in . But the rest of the question is valid and it does make sense to answer.

Upvotes: 2

Related Questions