Reputation: 2214
I want to run tools for static C/C++ (and possibly Python, Java etc.) code analysis for a large software project built with help of make
. As it is known, make
(or any other build tool) invokes compiler and similar tools for specified source code files. It is also possible to control compilation by defining environmental variables to be later passed to the compiler via its arguments.
The key to accurate static analysis is to provide defines and include paths exactly as they were passed to the compiler (basically all its -D
and -I
arguments). This way, the tool will be able to follow same code paths the compiler have followed.
The problem is, the high complexity of the project means there is no way to statically determine such environment, as different files are built with different sets of defines/include paths and other compilation flags.
The idea is that it should be somehow possible to capture individual invocations of the compiler with all arguments passed to it for each input file. Having such information and after its straightforward filtering (e.g. there is no need to know -O
optimization levels or -W
warning settings) it should be possible to invoke the static analyzer for each input file with the identical set of defines/includes used just for that input file.
The question is: are there existing tools/workflows that implement the idea I've described? I am mostly interested in a solution for POSIX systems, but ideas for Windows are also welcome.
A few ideas I've come to on my own.
The most trivial solution would be to collect make
output and process it afterwards. However, certain projects have makefile rules that give very concise output instead of verbose one, so it might require some tinkering with Makefiles, which is not always desirable. Parallel builds may also have their console output mixed up and impossible to parse. Adaptation to other build systems (Cmake) will not be trivial either, so it is far from being the most convenient way.
Running make
under ptrace
and recording all invocations of exec*
system calls that correspond to starting new applications, including compiler invocations. Then one will need to parse ptrace
's output. This approach is build system and language agnostic (will catch all invocations of any compiler for any language) and should work for parallel builds. However it seems to be more technically complex. Performance degradation to the build process because of ptrace
sitting on make
's back is unclear either. It will also be harder to port it to Windows, as program-tracing API is somewhat different there.
The proprietary static analyzer for C++ on Windows (and recently Linux AFAIK) PVS-Studio seems to implement the second approach, however details on how they do it are welcome. If there are other IDEs/tools that already have something similar to what I need, please share information on them.
Upvotes: 2
Views: 987
Reputation: 411
There are the following ways to gather information about the parameters of compilation in Linux:
Override environment CC/CXX variables. It is used in the utility scan-build from Clang Analyzer. This method works reliably only with simple projects for Make.
procfs - all the information on the processes is stored in /proc/PID/...
. Reading from a disk is a slow process, you might not be able to receive information about all processes of a build.
strace utility (ptrace library). The output of this utility contains a lot of useful information, but it requires a complicated parsing, because information is written randomly. If you do not use many threads to build the project, it is a fairly reliable way to gather information about the processes. It’s used in PVS-Studio.
JSON Compilation Database in CMake. You can get all the compilation parameters using the definition -DCMAKE_EXPORT_COMPILE_COMMANDS=On
. It is a reliable method if a project does not depend on non-standard environment variables. Also the project for CMake can be written with errors and issue incorrect Json, although this doesn’t affect the project build. It’s supported in PVS-Studio.
Bear utility (function substitution using LD_PRELOAD). You can get JSON Database Compilation for any project. But without environment variables it’ll be impossible to run the analyzer for some projects. Also, you cannot use it with projects, which already use LD_PRELOAD for a build. It’s supported in PVS-Studio.
Collecting information about compiling in Windows for PVS-Studio:
Visual Studio API to get the compilation parameters of standard projects;
MSBuild API to get the compilation parameters of standard projects;
Win API to get the information on any compilation processes as, for example, Windows Task Manager does it.
Upvotes: 4
Reputation: 6716
VERBOSE=true
is a default make option to display all commands with all parameters. It also works with CMake, for instance.
You might want to look at Coverity. They are attaching their tool to the compiler to get everything that the compiler receives. You could overwrite the environment variables CC
or CXX
to first collect everything and then call the compiler as usual.
Upvotes: 0