Reputation: 548
There is a huge C++ project that is built using CMake and gcc 4.2.3. The application employs multiple processes.
The end goal is to make a list of all error messages that could ever be written to the log file. Information and debug messages are also written to this file.
I found that in some main.cpp (file where everything starts) there is a catch expression where writing to the file occurs. So I need to find throw expressions that satisfy the following criteria:
The project is very big and it is difficult to tell whether this part of code will ever be executed in this build. Some builds are using certain libs and others don't.
Maybe I'm wrong with the approach, but I think that the solution is a 2-step process:
Find all throw expressions in the compiled tree and emulate throwing. In fact, I see a problem here because conditions may be really involved, for example:
string error_msg;
enum Condition condition;
switch(condition)
{
CONDITION1: error_msg = "sadasda"; break;
CONDITION2: error_msg = "sadasds1111a"; break;
CONDITION3: error_msg = "sasdasadasda"; break;
default: error_msg = "sadasda"; break;
}
throw logic_error(error_msg);
Maybe it's all wrong and a different approach should be taken. I would be glad to see your advice.
Upvotes: 2
Views: 978
Reputation: 95334
(Responding long after the question; a recent edit to that question popped this question to visibility for me).
OP has it basically right; you need a compiler-accurate parse of the source code, and you need to track the throw sequences to see what they do.
If fact, you need compiler-accurate parses of all the compilation units involved in the project, and you'll need all of them at once to navigate from one compilation unit to another to track the throws. This means using a conventional compiler front end isn't the right starting place; those only parse one compilation unit at a time, and you need all of them at once.
Then there's the bit about tracing the "throws". You need the control flow within each function/method to follow throws within the method, and then you need to track throws across method calls. For the latter, you need an accurate call graph. A standard compiler might give the intra-method control flow, but it won't compute a global call graph.
To get an good call graph, you need resolve explicit calls from foo to bar, and you need to determine for indirect calls through pointers, which methods/functions are possible targets of the call, and you need to determine for polymorphic method calls (a special case of indirect calls) the same thing. So you need a points-to analyzer.
With local control flow and an accurate call graph, you can now find each initial throw, and track ("simulate") them from the throw site through the catch chains to see if they ultimate arrive at main (or at least at a call to a logging function). The throw-catch-test-rethrow is sort of straightforward to track; you'll have trouble in complex catch clause containing a lot of logic that eventually re-throws, tracking the actual re-thrown exception or even when something gets rethrown. Welcome to static analysis and the Turing tar pit.
So in fact you need a tool that is designed to do these things as well as they can be done.
Alas, I know of no tool as of this moment that will do all of that nicely, off the shelf, and I try to keep track of such things. (This is generally true of any specific static analysis somebody might want). So the question becomes, where do you get infrastructure that will let you accomplish this task as a custom job?
Clang can provide some of this; it will certainly parse and build ASTs for C++. After firing up LLVM, you will have intra-method control flow analysis. I think Clang can be configured to parse multiple compilation units, so that's a big step up from what using a compiler will offer you. I don't know what Clang offers for doing points-to analysis or building call graphs. You'll have to fill that in, and build custom code for "simulating" the throws.
Our DMS Software Reengineering Toolkit, used for program analysis and transformation, could be used for this. DMS can also parse full C++ in a compiler accurate way, and is designed to parse/process multiple compilation units simultaneously.
DMS does produce intra-method control flow analysis, and it has intra method-level data flow analysis. We presently don't have points-to analysis for C++, but DMS does have both points-to analysis and call-graph construction for C that could be pressed into service, that has been tested on applications with 15,000 (not a typo) compilation units in one image having some 50,000 functions and indirect calls tangled across all of this. (If Clang doesn't have this kind of machinery already, this is a huge difference in starting places). With that, then you get to build the throw simulation on top.
Having considered all this, my guess is the work to do the above for Clang and/or DMS is significant. If your application is less than a million lines, I'd expect you would get done faster (if not more sloppily) by just hunting for throw clauses using grep and hand-tracing them through the code yourself. You said your application was huge; hard to tell what that actually means without specific numbers. These tools work really well at scale, but aren't worth the effort when your problem is small. [What is interesting is that the boundary for "small" moves over time, as the tools get better].
Upvotes: 1
Reputation: 14064
Writing a C++ valid parser is indeed a daunting task to say the least, and probably not the faster way to get where you want.
Basically, what you want is to reuse an existing parser for your purposes, which is not easy either. You'd need to research various compiler plugins and static analysis tools. For example the clang static analyzer seems (relatively!) easily extensible. Perhaps a simpler way would be to use an existing static C++ analyzer, like lint, and detect uncaught exceptions. Then, you modify your main to stop catching the exceptions you're interested in and have a look at the list of uncaught exceptions. You're far from done, but you can start working from there. C++ lint is not free software, but AFAIK free alternatives (cppcheck, clang anlyzer) don't have advanced exception analysis. Maybe coverity could also be of interest, they have scripts and/or SDK for writing extensions.
Another way would be to leak memory on purpose in your exception objects, and any good static analyzer will find the source of the leak at the point where the exception object was created, i.e. the throw site and maybe even points where you add info to the exception. I don't know if this is realistic with your code, but in this setup, I think free analyzers could work.
Anyway, I wish you luck, working with large codebases is never easy ;)
Upvotes: 2