ninjaconcombre
ninjaconcombre

Reputation: 534

List "never linked against" source file in C project

I would like to know if someone is aware of a trick to retrieve the list of files that had been (or ideally will be) used by linker to produce an executable.

Some kind of solution must exist. A a static source analyzer, or a hack, such as compiling with some weird flags, and analyzing produced executable with another tool, or force the linker to output this information.

The goal is to provide a tool that strip useless source files from a list of source files.

The end goal is to ease the build process, by allowing him to give a list of usable source files. Then my tool would only compile the ones actually used by linker instead of everything.

This would allow for some unit_test to still be runnable even if some others are broken and can't compile, while not asking the user to manually list every test dependencies manually in the cmake.

I am targetting linux for now, but will be intersted in the futur to do the same trick on others OS. So I would like a cross-platform solution, eventhought I doubt I will have it :)

Thanks for your help

Edit because I see that it is confusing, what I mean by

allowing him to give a list of usable source file

is that, in cmake, for exemple. If you use add_executable(name, sources), then sources is considered as the sources to compile and link on.

I want to wrap add_executable, so sources is viewed as a set of usable if necessary sources files.

Upvotes: 1

Views: 75

Answers (1)

Mike Kinghan
Mike Kinghan

Reputation: 61307

I'm afraid the idea of detecting never linked source files is not a fruitful one.

To build a program, CMake will not compile a source file if it not going to link the resulting object file into the program. I can understand how you might think that this happens, but it doesn't happen.

CMake already does what you would like it to do and the same is true of every other build automation system going back to their invention in the 1970s. The fundamental purpose of all such systems is to ensure that the building of a program compiles a source file name.(c|cc|f|m|...) if and only if the object file name.o is going to be linked into the program and is out of date or does not exist. You can always defeat this purpose by egregiously bad coding of the project's build spec (CMakeLists.txt, Makefile, SConstruct, etc.), but with CMake you would need to be really trying to do it, and trying quite expertly.

If you do not want name.c to be compiled and the object file name.o linked into a target program, then you do not tell the build system that name.o or name.c is a prerequisite of the program. Don't tell it what you know is not true. It is elementary competence not to specify redundant prerequisites of a build system target.

The linker will link all its input object files into an output program without question. It does not ask whether or not they are "needed" by the program because it cannot answer that question. Neither the linker nor any possible static analysis tool can know what program you intend to produce when you input some object files for linkage. It can only be assumed that you intend to produce the program that results from the linkage of those object files, assuming the linkage is successful.

If those object files cannot be linked into a program at all, the linker will tell you that, and why. Otherwise, if you have linked object files that you didn't intend to link, you can only discover that for yourself, by noticing the mistake in the build log, or failing that by testing the program and/or inspecting its contents and comparing your observations with your expectations.

Given your choice of object files for linkage, you can instruct the linker to detect any code sections or data sections it extracts those object files in which no symbols are defined that can be referenced by the program, and to throw away all such unreferenced input sections instead of linking them into the program. This is called linktime "garbage collection". You tell the linker to do it by passing the option -Wl,-gc-sections in the gcc linkage command. See this question to learn how to maximise the collectible garbage. This is what you can do to remove redundant object code from the linkage.

But you can only collect any garbage from a program in this way if the program is dynamically opaque, i.e not linked with the option -rdynamic : then the global symbols defined in the program's static image are not visible to the OS loader and cannot be referenced from outside its static image by dynamic libraries in the same process. In this case the linker can determine by static analysis that a symbol whose definition is not referenced in the program's static image cannot be referenced at all, since it cannot be referenced dynamically, and if all symbols defined in an input section are statically unreferenced then it can garbage-collect the section.

If the program has been linked -rdynamic then -Wl,-gc-sections will collect no garbage, and this is quite right, because if the program is not dynamically opaque then it is impossible for static analysis to determine that anything defined in its linkage cannot be referenced.

It's noteworthy that although -rdynamic is not a default linkage option for GCC, it is a default linkage option for CMake projects using the GCC toolchain. So to use linktime garbage collection in CMake projects you would always have to override the -rdynamic default. And obviously it would only be valid to do this if you have determined that it is alright for the program to be dynamically opaque.

Upvotes: 2

Related Questions