Reputation: 393
I have a C program and I want to track all branch conditions which belong to an execution path corresponding to a concrete input. For example, consider a simple program:
#include <stdio.h>
#include <string.h>
int test(char* a) {
if (strcmp(a, "123") == 0)
return 0;
if (strcmp(a, "123") < 0)
return -1;
else
return 1;
}
int main() {
char* a;
return test (a);
}
With a = "1234"
, the program return 1
and the corresponding path condition is strcmp(a, "123") > 0
. I want to collect strcmp
, "123"
and value of this operator (-1). Is there any approach which allows me to do so? I first thought about working with some C parser but seems like its not that simple. To get the values of parameters we have to deal with pointer analysis or external library call, which I don't know how to solve.
Upvotes: 2
Views: 156
Reputation: 95316
You don't get a choice about whether you need a parser or not. You need it (including a full preprocessor). What you especially don't want to do is roll your own parser; C is way more complicated that you think, and then you have to worry about the specific dialect of C of interest.
But a parser is hardly enough; you need a tool that can resolve names to declarations and expressions to types, that can extract flow graphs, that can do points to analysis and especially function-points-to analysis, and finally something that can construct a call graph taking all that into account. This means you need compiler-like framework to solve the problem.
You can perhaps bend GCC (hard, it really wants to be a compiler) or Clang (wants be be, but is designed to allow other purposes) to do this, but they want to compile individual programs and you need something that can handle a set of programs (since C programs are typically built that way). Our DMS Software Reengineering Toolkit is designed specifically to support this kind of of custom task, and has all the requisite machinery. (DMS also supports specific dialects of C).
Having a made a framework choice, the code to navigate along the flow path of interest is not likely to be simple. The generic advice will be, "parse the source, apply all of the above analyzers, and then walk the data structures using the tool-offered APIs" with obvious special focus on APIs that follow control flow paths. All of that code is not likely to fit nicely in an example. And it will vary considerably depending on which framework you choose. I don't the details of GCC or Clang intimately, but I don't think they offer you any easy means to write down a symbolic description of the path (e.g., a conjunction of the conditional expressions). I can tell you that DMS (by design) has explicit support for constructing symbolic formulas like this.
No matter which route you go, you'll be facing the educational cost of learning how to deal with these tools. They are complex; expect your learning curve to be relatively long.
Upvotes: 1