Reputation: 189626
I get the idea of a control flow graph; it involves nodes that are basic blocks (sequences of operations that always occur), connected by edges that represent jumps.
But how do you represent a subroutine call?
If I have two functions like this:
int tweedledee(void)
{
int x = 16;
return x + do_something();
}
int tweedledum(int n)
{
if (n < 0)
return n;
else
return n + do_something();
}
with both functions calling do_something()
, then I need a way to allow a jump from a block in tweedledee
to do_something
and then another jump back to tweedledee
, and a jump from a block in tweedledum
to do_something
and then back to tweedledum
, but there's never a jump from tweedledee
to do_something
and then to tweedledum
. (Or tweedledum
→ do_something
→ tweedledee
) So it seems like a plain directed graph wouldn't suffice to define these relationships... maybe I'm missing something.
Upvotes: 2
Views: 1647
Reputation: 533
Procedures make CFGs and static analysis in general quite complicated. There are different approaches for representing routine calls in control flow graphs.
One first and common solution is to create a CFG for each routine, and to split the "call nodes" (the basic block corresponding to "CALL do_something()" in tweedledee for example) into two nodes, the actual call block C and a block R for writing the return value.
An edge (normally of special type) is inserted between C and the initial block of the routine being called, and another one between the end block of the routine being called and R. A simple example:
void foo() { int x = bar(); }
int bar() { return 1; }
might be transformed to:
[init::foo] ==> [CALL bar()] [x = RETVAL(bar())] ==> [end::foo]
|| /\
\/ ||
[init::bar] ==> [ret = 1 (end::bar)]
If there is another call to bar(), e.g. from a routine
void foo2() { int y = bar(); }
then the following graph might be the result:
[init::foo] ==> [CALL bar()] [x = RETVAL(bar())] ==> [end::foo]
|| /\
\/ ||
[init::bar] ==> [ret = 1 (end::bar)]
/\ ||
|| \/
[init::foo2]==> [CALL bar()] [x = RETVAL(bar())] ==> [end::foo2]
The problem here: There are now paths in this graph (e.g. init::foo ==> CALL bar() ==> init::bar ==> ret = 1 ==> x = RETVAL(bar()) ==> end::foo2) which make no sense in the program. This is what you might mean by wondering whether a "plain directed graph" suffices. There are different approaches for this problem, e.g.: Make copies of a routine (here bar) for each invocation of it. This only helps if there is no real recursion, and is expensive in general. For static analysis it is often useful to "overapproximate" the number of paths by only using a fixed number of such copies.
The slides Interprocedural Analysis Lecture Notes (Stephen Chong) seem to be a good introduction. There are also quite good books about constructing such graphs, e.g. Principles of Program Analysis by Nielson et al.
Upvotes: 3