Reputation: 588
I recently discovered the LLVM's linker, lld
that was praised for very fast linking. Indeed, I tested it and the results were awesome, the linking time in my case being reduced dramatically comparing to gold
.
However, when speaking about link-time optimization, my knowledge is limited. As far as I understood by reading stuff on the internet, there is some extra-code produced in the object files, representing some internal compiler structures which is then used in the linking stage. Thus, my concern is if the link-time optimization (and it's benefits) is affected by this compiler/linker mix. I would appreciate some explanation on the matter!
I used gcc
version 9.2.0
and lld
version 10.0.0
.
Command I used for generating object files:
/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -O3 -g -DNDEBUG -o my_object.cpp.o -c my_source_file.cpp
For linking:
#-fuse-ld=gold
/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -pie -fuse-ld=gold -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -static-libstdc++ -static-libgcc -Wl,--threads -Wl,--thread-count,1
#-fuse-ld=lld
/opt/gcc/9.2.0/bin/c++ -fPIE -flto -ffat-lto-objects -fuse-linker-plugin -m64 -pie -fuse-ld=lld -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -static-libstdc++ -static-libgcc -Wl,--threads -Wl,
Upvotes: 9
Views: 4420
Reputation: 588
I did some research and finally concluded for myself that no LTO is done if we use lld
when compiling with gcc
. What I did:
Based on this somewhat vague presentation: https://www.slideshare.net/chimerawang/gcc-lto, I found that the linker is not directly doing the optimization, but rather, after reading all the symbols from all the object files, he passes the info to the lto-wrapper
who then does the optimization through some other processes. So I made a test using a hello-world
cpp file, compiling it with the -v
flag and indeed I saw the succession of calls as earlier mentioned (collect2
(linker) -> lto-wrapper
-> lto1
). But this when using the default linker or the gold
linker. When I used the -fuse-ld=lld
flag, only the collect2
process was called. And this first thing made me believe that LTO was not done at all.
But hey, maybe the lld
linker internalized the LTO process so it is done without calling any other process. So I made another test to see if LTO is done (based on this article). Basically from one cpp file I call for 100 000 000 times a function that's defined in other cpp file, a function which does nothing. Using basic -O2
optimization, the resulted binary runs in ~200ms, as the compiler is not able to optimize out the useless function calls. When using also the -flto
flag and either ld
or gold
linker, the resulted binary runs in ~2 ms. But when using the lld
linker, the resulted binary also runs in ~200ms. So lld
with lto runs as slow as lld
without lto. No sign of optimization whatsoever.
To be mentioned here that using the lld
linker, the link command would fail if the objects would not be compiled using -ffat-lto-objects
. This flag makes the object files larger because the compiler dumps not only the lto code, but also the code that can be linked without lto.
So, considering the time performance of the binary linked with lld
and also the fact that objects need to be compiled with -ffat-lto-objects
, I concluded that when the lld
linker is used, LTO is not achieved at all, but lld
uses the non-LTO code generated by the compiler in order to link the binary.
Upvotes: 10