Bregalad
Bregalad

Reputation: 634

How does gcc's linktime optimisation (-flto flag) work

I understand more or less the idea: When compiling separate modules and producing assembly code, functions calling each other have to respect strictly the calling convention, which kills the opportunity for many optimisations when compiling separate modules.

For instance if I have function A which calls function B which calls function C, all 3 in their own separate source files, it becomes possible to allocate registers evenly within the functions so that no register saving on the stack is necessary at all during those calls. With traditional compile-assembly-linking this is not possible, as the caller-saved and callee-saved registers are imposed by the calling convention.

Another optimisation is to inline functions which are called only once. This previously was possible only if a function is local, but thanks to linktime optimisation it's now possible even if the function is in another source file.

Now, if I compile with both -flto and -S flags, I see that instead of normal assembly instructions, gcc generates an encoded representation of the program, such as this:

    .section        .gnu.lto_.inline.c3c5e6ef8ec983c,"dr0"
    .ascii "x\234mQ;N\303@\20}\273\353\17\370C\234\20\242`\"!Q\20\11Ah\322&\25\242\314\231|\4\32\220\220(,$.@\205D\343\3P Z.\341Tn\231\35\274\31L\342\342\355\314\274\371<\317\30\354\376\356\365\357\333\7\262"
    .ascii "1\240G\325\273\202\7\216\232\204\36\205"
    .ascii "8\242\370\240|\222"
    .ascii "8\374\21\205ty\352\"*r\340!:!n\357n%]\224\345\10|\304\23\342\274z\346"
    .ascii "8\35\23\370\7\4\1\366s\362\203j\271]\27bb{\316\353\27\343\310\4\371\374\237*n#\220\342rA\31"
    .ascii "7\365\263\327\231\26\364\10"
    .ascii "2\\-\311\277\255^w\220}|\340\233\306\352\263\362Qo+e+\314\354\277\246\354\252\277\20\364\224%T\233'eR\301{\32\340\372\313\362\263\242\331\314\340\24\6\21s\210\243!\371\347\325\333&m\210\305\203\355\277*\326\236\34\300-\213\327\306\2Td\317\27\231\26tl,\301\26\21cd\27\335#\262L\223"
    .ascii "8\353\30\351\264{I\26\316\11\14"
    .ascii "9\326h\254\220B}6a\247\13\353\27M\274\231"
    .ascii "0\23M\332\272\272%d[\274\36Q\200\37\321\1&\35"

Since the data is in its own particular section, the linker sees this, and does the code generation. If the module was written in either assembly or with no -flto flag, then the linker would see data in the .text section instead, so there is no confusion possible for the linker.

The problem is: How can the linker generate code? Normally only gcc can generate code, the linker's role is just here to change a few offsets and adapt the binary format. In order to generate code, the linker would need to contain a second copy of the entire gcc backend (half of the compiler which generates assembly code from intermediate representation), as well as the entire assembler (since no assembly code was produced). How is such a thing possible, especially considering that binutils is a completely separate entity from gcc, developed by different teams?

Upvotes: 1

Views: 6535

Answers (2)

Peter - Reinstate Monica
Peter - Reinstate Monica

Reputation: 16112

From https://gcc.gnu.org/wiki/LinkTimeOptimization:

Despite the "link time" name, LTO does not need to use any special linker features. The basic mechanism needed is the detection of GIMPLE sections inside object files. This is currently implemented in collect2 [which is called by gcc; -ps]. Therefore, LTO will work on any linker already supported by GCC.

I assume this means you must link calling the compiler driver gcc. Simply linking with the system's vanilla linker wouldn't optimize the whole program, as you already concluded.

Update:

https://gcc.gnu.org/onlinedocs/gccint/Collect2.html says

The program collect2 is installed as ld in the directory where the passes of the compiler are installed. When collect2 needs to find the real ld it tries the following file names: [...]

(The page goes on detailing how collect2 looks for configuration-dependent executables and ones with well-known names like real-ld, finally even ld; but will not call itself recursively.)

Upvotes: 2

Tom Tromey
Tom Tromey

Reputation: 22599

GCC's -flto emits a serialized form of GCC's internal representation, as you discovered.

Then, at link time, the linker reinvokes GCC and passes it the objects that need final compilation. GCC reads the internal representation and does the work.

I think the actual work is done in collect2, which is part of GCC that is used when invoking the linker (I'm a little fuzzy on the details). There is also a "linker plugin" system that enables this to work a little better (like letting the linker decide how to split the compilation). This is implemented at least by the binutils ld and by gold; but as far as I recall this is just an optimization and isn't needed to get the basic -flto feature to work. You can see a bit more information on the original LTO project page; and maybe links from there would explain more.

There is more overlap between the GCC and binutils teams than you might think. The two projects share some code and have a long history of working together. Some people work on both projects.

Upvotes: 2

Related Questions