Benjamin Bihler
Benjamin Bihler

Reputation: 2059

How can multiple definitions in static libraries be detected/prevented?

It is known that some linkers do not report multiple definition errors when there are multiple definitions in static libraries. See for example here: Multiple definition within static library or here: Linker does not emit multiple definition error when same symbol coexists in object file and static library.

The "problem" can be easily reproduced like this:

File main1.cpp

#include <iostream>

int main()
{
    std::cout << "Hello, World 1!" << std::endl;
    return 0;
}

File main2.cpp

#include <iostream>

int main()
{
    std::cout << "Hello, World 2!" << std::endl;
    return 0;
}

and then:

g++ -c main1.cpp
ar rvs main1.a main1.o
g++ -c main2.cpp
ar rvs main2.a main2.o
g++ main1.a main2.a
./a.out

will give no error message, but the output:

Hello, World 1!

This may be very dangerous as the behaviour of the program is undefined according to the One Definition Rule: https://en.cppreference.com/w/cpp/language/definition.

And if for example a unit test application is built from static libraries that contain multiple definitions of important functions, the unit test may even not test what it is supposed to test.

Of course, the setting can be fixed by removing the multiple definitions. But that does not prevent unintentional re-adding of multiple definitions in the future.

Is there a way to automatically detect and/or prevent multiple definitions in static libraries also for the future?

Upvotes: 2

Views: 102

Answers (1)

Mike Kinghan
Mike Kinghan

Reputation: 61327

There is no uncertainty as to which definition of main is linked in your example and the One Definition Rule is honoured. The order in which libraries are linked matters and it the user's responsibility to link them in the order that links the desired definitions into the program. There are no linkers that diagnose the existence of multiple definitions of a symbol in the static libraries in a linkage, but all linkers fail a linkage that would link multiple definitions of a symbol into the program, no matter where they come from. Determining whether a set of static libraries contains multiple definitions of any symbol is straightforward, if you want to do it for any reason. As a principle however it is unnecessary and undesirable to prevent there being multiple definitions of a symbol within the static libraries in a linkage.

First review the Stack Overflow wiki on static libraries. A static library is an archive of object files that may be input to a linkage for the linker to extract and link just those object files that provide definitions for hitherto undefined symbol references that have accrued in the linkage at the point where the static library is input.

Thus your linkage:

g++ main1.a main2.a

proceeds like this:

Before any other object files are linked, g++ (or gcc) links the C runtime start-up code, which for my toolchain and likely yours too is the object file:

/usr/lib/x86_64-linux-gnu/Scrt1.o

This object file defines the C runtime function _start, the operating system's entry to the program, which runs invariant program initializations ultimately concluding with a call to main. Here is its symbol table:

$ readelf --syms --wide /usr/lib/x86_64-linux-gnu/Scrt1.o

Symbol table '.symtab' contains 10 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 .text
     2: 0000000000000000    32 OBJECT  LOCAL  DEFAULT    2 __abi_tag
     3: 0000000000000000    38 FUNC    GLOBAL DEFAULT    3 _start
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND main
     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT    8 data_start
     6: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
     7: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    5 _IO_stdin_used
     8: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __libc_start_main
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    8 __data_start
     

in which you can see that _start is a defined function and main is an undefined reference. So with the linkage of Scrt1.o, there is an undefined reference to main in the program.

Next, the linker consumes main1.a. It examines the symbol tables of the object files in this library:-

$ readelf --syms --wide main1.a

File: main1.a(main1.o)

Symbol table '.symtab' contains 14 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main1.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 .rodata
     4: 0000000000000010     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedIjEE
     5: 0000000000000011     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedImEE
     6: 0000000000000012     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedIyEE
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt21ios_base_library_initv
     8: 0000000000000000    58 FUNC    GLOBAL DEFAULT    1 main
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4cout
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E
    

to see if any of those object files (there is only one in this case) provide external (GLOBAL) definitions for symbols that are at this point undefined in the program. The answer is Yes: main1.a(main1.o) defines the hitherto undefined symbol main.

The linker therefore copies main1.o out of main1.a and links it into the program. That defines main in the program.

Next, the linker consumes main2.a and again searches the symbol tables of the object files therein:-

$ readelf --syms --wide main2.a

File: main2.a(main2.o)

Symbol table '.symtab' contains 14 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main2.cpp
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 .rodata
     4: 0000000000000010     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedIjEE
     5: 0000000000000011     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedImEE
     6: 0000000000000012     1 OBJECT  LOCAL  DEFAULT    5 _ZNSt8__detail30__integer_to_chars_is_unsignedIyEE
     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt21ios_base_library_initv
     8: 0000000000000000    58 FUNC    GLOBAL DEFAULT    1 main
     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4cout
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E

for definitions that it needs. The only external definition provided here is again main (the symbol table identical to the last one, except for the source file name), which is already defined in the program. So the linker needs no object files from main2.a and links none: it might as well not exist.

The linker passes on to the remaining input object files and libraries (which in this case are the boilerplate ones that g++ inputs to the linker by default) in its quest to resolve remaining undefined references:

     7: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt21ios_base_library_initv

     9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4cout
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZNSolsEPFRSoS_E

which it will finally achieve when it gets to:

/usr/lib/x86_64-linux-gnu/libstdc++.so.6

which is the Standard C++ Library, linked by default.

The end.

The linkage is unambiguously equivalent to:

g++ main1.a

which in this case is equivalent to:

g++ main1.o

as we can check by asking the linker to show us where main is referenced and defined in all three cases:

 g++ main1.a main2.a -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.a(main1.o): definition of main
Hello, World 1!

 g++ main1.a -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.a(main1.o): definition of main
Hello, World 1!
    
$ g++ main1.o -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.o: definition of main
Hello, World 1!

There is only ever one definition linked and it is the first one the linker finds in main1.a(main1.o) or main1.o, which are the same one.

A multiple definition error interdicts the existence of a program that would violate the One Definition Rule. A multiple definition error is provoked by the linkage:

$ g++ main1.o main2.o
/usr/bin/ld: main2.o: in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.o:main1.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

That is because any input object file is unconditionally linked, while an input static library is searched for contained object files that are needed. If object files were linked on as-needed basis then linkage could never get started.

It is also provoked by the non-default linkage:

$ g++ -Wl,--whole-archive main1.a main2.a -Wl,--no-whole-archive
/usr/bin/ld: main2.a(main2.o): in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.a(main1.o):main1.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

Because the linker options --whole-archive ... --no-whole-archive instruct the linker to abandon its default as-needed principle with respect to any enclosed static libraries and instead link all the member object files whether it needs them or not.

Accordingly, it you want to know in advance whether a set of static libraries contain multiple definitions of any symbol, you can do so by experimentally attempting to link them --whole-archive into a single object file, e.g.

$ ld -r --whole-archive main1.a main2.a --no-whole-archive
ld: main2.a(main2.o): in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.a(main1.o):main1.cpp:(.text+0x0): first defined here 

For any symbol foo of which there are multiple definitions in static libraries input to a linkage, the linker by default will link the first member object file that defines the symbol after an undefined reference to foo has been linked, and will not then go looking to link multiple definitions. This principle is the basis for the immemorial technique of library interposition, e.g. a library libmymath.a is input to a linkage before libmath.a where libmymath.a provides preferred definitions of some of the functions defined in libmath.a, so that the definitions of libmymath.a will be linked instead of those of libmath.a.

The sole (messed-up) scenario in which the default linkage of multiple static libraries that each define foo will provoke a multiple definition error is one in which the member object file that is linked to resolve foo introduces an undefined reference to bar, and then the member object file that is linked to resolve bar also contains a second definition of foo (or perhaps the same result from a longer chain of references and object files).

Note that it is very unorthodox for a static library to contain an object file that defines main. It looks as if you have come across GoogleTest's libgtest_main.a and libgmock_main.a, which do so, but these are exotic counterexamples.

Upvotes: 3

Related Questions