Reputation: 2059
It is known that some linkers do not report multiple definition errors when there are multiple definitions in static libraries. See for example here: Multiple definition within static library or here: Linker does not emit multiple definition error when same symbol coexists in object file and static library.
The "problem" can be easily reproduced like this:
File main1.cpp
#include <iostream>
int main()
{
std::cout << "Hello, World 1!" << std::endl;
return 0;
}
File main2.cpp
#include <iostream>
int main()
{
std::cout << "Hello, World 2!" << std::endl;
return 0;
}
and then:
g++ -c main1.cpp
ar rvs main1.a main1.o
g++ -c main2.cpp
ar rvs main2.a main2.o
g++ main1.a main2.a
./a.out
will give no error message, but the output:
Hello, World 1!
This may be very dangerous as the behaviour of the program is undefined according to the One Definition Rule: https://en.cppreference.com/w/cpp/language/definition.
And if for example a unit test application is built from static libraries that contain multiple definitions of important functions, the unit test may even not test what it is supposed to test.
Of course, the setting can be fixed by removing the multiple definitions. But that does not prevent unintentional re-adding of multiple definitions in the future.
Is there a way to automatically detect and/or prevent multiple definitions in static libraries also for the future?
Upvotes: 2
Views: 102
Reputation: 61327
There is no uncertainty as to which definition of main
is linked in your example and the One Definition Rule is honoured. The order in which libraries are linked matters and it the user's responsibility to link them in the order that links the desired definitions into the program. There are no linkers that diagnose the existence of
multiple definitions of a symbol in the static libraries in a linkage, but all linkers fail a linkage that would link multiple definitions of a symbol into the program, no matter where they come from. Determining whether a set of static libraries contains multiple definitions of any symbol is straightforward, if you want to do it
for any reason. As a principle however it is unnecessary and undesirable to prevent there being multiple definitions of a symbol within the static libraries in a linkage.
First review the Stack Overflow wiki on static libraries. A static library is an archive of object files that may be input to a linkage for the linker to extract and link just those object files that provide definitions for hitherto undefined symbol references that have accrued in the linkage at the point where the static library is input.
Thus your linkage:
g++ main1.a main2.a
proceeds like this:
Before any other object files are linked, g++
(or gcc
) links the
C runtime start-up code, which for my toolchain and likely yours too is the
object file:
/usr/lib/x86_64-linux-gnu/Scrt1.o
This object file defines the C runtime function _start
, the operating system's entry to the program, which runs invariant
program initializations ultimately concluding with a call to main
. Here is its symbol table:
$ readelf --syms --wide /usr/lib/x86_64-linux-gnu/Scrt1.o
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .text
2: 0000000000000000 32 OBJECT LOCAL DEFAULT 2 __abi_tag
3: 0000000000000000 38 FUNC GLOBAL DEFAULT 3 _start
4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND main
5: 0000000000000000 0 NOTYPE WEAK DEFAULT 8 data_start
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
7: 0000000000000000 4 OBJECT GLOBAL DEFAULT 5 _IO_stdin_used
8: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __libc_start_main
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 8 __data_start
in which you can see that _start
is a defined function and main
is an
undefined reference. So with the linkage of Scrt1.o
, there is an undefined
reference to main
in the program.
Next, the linker consumes main1.a
. It examines the symbol tables of the
object files in this library:-
$ readelf --syms --wide main1.a
File: main1.a(main1.o)
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main1.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000010 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedIjEE
5: 0000000000000011 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedImEE
6: 0000000000000012 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedIyEE
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt21ios_base_library_initv
8: 0000000000000000 58 FUNC GLOBAL DEFAULT 1 main
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4cout
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZNSolsEPFRSoS_E
to see if any of those object files (there is only one in this case) provide
external (GLOBAL
) definitions for symbols that are at this point undefined in the program. The answer
is Yes: main1.a(main1.o)
defines the hitherto undefined symbol main
.
The linker therefore copies main1.o
out of main1.a
and links it into the
program. That defines main
in the program.
Next, the linker consumes main2.a
and again searches the symbol tables of
the object files therein:-
$ readelf --syms --wide main2.a
File: main2.a(main2.o)
Symbol table '.symtab' contains 14 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main2.cpp
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
4: 0000000000000010 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedIjEE
5: 0000000000000011 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedImEE
6: 0000000000000012 1 OBJECT LOCAL DEFAULT 5 _ZNSt8__detail30__integer_to_chars_is_unsignedIyEE
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt21ios_base_library_initv
8: 0000000000000000 58 FUNC GLOBAL DEFAULT 1 main
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4cout
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZNSolsEPFRSoS_E
for definitions that it needs. The only external definition provided here is
again main
(the symbol table identical to the last one, except for the source file name),
which is already defined in the program. So the linker needs no object files from
main2.a
and links none: it might as well not exist.
The linker passes on to the remaining input object files and libraries (which in this
case are the boilerplate ones that g++
inputs to the linker by default) in its
quest to resolve remaining undefined references:
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt21ios_base_library_initv
9: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4cout
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _ZNSolsEPFRSoS_E
which it will finally achieve when it gets to:
/usr/lib/x86_64-linux-gnu/libstdc++.so.6
which is the Standard C++ Library, linked by default.
The end.
The linkage is unambiguously equivalent to:
g++ main1.a
which in this case is equivalent to:
g++ main1.o
as we can check by asking the linker to show us where main
is referenced and
defined in all three cases:
g++ main1.a main2.a -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.a(main1.o): definition of main
Hello, World 1!
g++ main1.a -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.a(main1.o): definition of main
Hello, World 1!
$ g++ main1.o -Wl,-trace-symbol=main && ./a.out
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/Scrt1.o: reference to main
/usr/bin/ld: main1.o: definition of main
Hello, World 1!
There is only ever one definition linked and it is the first one the linker
finds in main1.a(main1.o)
or main1.o
, which are the same one.
A multiple definition error interdicts the existence of a program that would violate the One Definition Rule. A multiple definition error is provoked by the linkage:
$ g++ main1.o main2.o
/usr/bin/ld: main2.o: in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.o:main1.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
That is because any input object file is unconditionally linked, while an input static library is searched for contained object files that are needed. If object files were linked on as-needed basis then linkage could never get started.
It is also provoked by the non-default linkage:
$ g++ -Wl,--whole-archive main1.a main2.a -Wl,--no-whole-archive
/usr/bin/ld: main2.a(main2.o): in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.a(main1.o):main1.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
Because the linker options --whole-archive ... --no-whole-archive
instruct the linker to abandon its default as-needed principle with respect
to any enclosed static libraries and instead link all the member object files
whether it needs them or not.
Accordingly, it you want to know in advance whether a set of static libraries contain multiple definitions of any symbol, you can do so by experimentally attempting to link them --whole-archive
into a single object file, e.g.
$ ld -r --whole-archive main1.a main2.a --no-whole-archive
ld: main2.a(main2.o): in function `main':
main2.cpp:(.text+0x0): multiple definition of `main'; main1.a(main1.o):main1.cpp:(.text+0x0): first defined here
For any symbol foo
of which there are multiple definitions in static libraries
input to a linkage, the linker by default will link the first member object file
that defines the symbol after an undefined reference to foo
has been linked,
and will not then go looking to link multiple definitions. This principle is
the basis for the immemorial technique of library interposition, e.g. a library
libmymath.a
is input to a linkage before libmath.a
where libmymath.a
provides
preferred definitions of some of the functions defined in libmath.a
, so that
the definitions of libmymath.a
will be linked instead of those of libmath.a
.
The sole (messed-up) scenario in which the default linkage of multiple static libraries that
each define foo
will provoke a multiple definition error is one in which the
member object file that is linked to resolve foo
introduces an undefined reference to
bar
, and then the member object file that is linked to resolve bar
also contains a second definition of foo
(or perhaps the same result from a longer chain of references and object files).
Note that it is very unorthodox for a static library to contain an object file that defines main
. It looks
as if you have come across GoogleTest's libgtest_main.a
and libgmock_main.a
,
which do so, but these are exotic counterexamples.
Upvotes: 3