Reputation: 2245
The Red-Dragon book contains, page 529, algorithm 9.1, and algorithm for partitioning assembly code into basic blocks. The algorithm is essentially:
Input: Assembly, as a list of instructions.
Output: A list of basic blocks of instructions
Phase 1: Mark the leaders. The very first statement (of a procedure) is a leader. Any statement that is the target of a branch is a leader. Any statement that immediately follows a branch is a leader.
Phase 2: Assemble the blocks. To make a basic block, add a leader and all statements to the block until another leader is encountered. Do this until no statements are left.
End and return the blocks.
Then, later, they use this block structure to develop control flow analyses. The basic structure developed is a control flow graph; it is made by treating basic blocks as nodes in a graph and creating a directed edge from Block Bi to Bj if Bi ends in a jump to Bj.
Cooper, Harvey and Waterman (in Building a Control Flow Graph from Scheduled Code) point out that this algorithm for creating a control flow graph is insufficient if there is a jump to a location held in a memory address or register.
There are several questions this brings up. When can assembly contain a branch to a location in memory? What is scheduled code? Are there any other issues one should be aware of when building a control flow graph from x86? What are the best known algorithms/implementations for building control flow graphs directly from x86?
Upvotes: 3
Views: 1241
Reputation: 137272
There are several cases where a branch is performed on register's content (or variable content, for that matter)
Here is an example for function pointer call (64 bit Mac OS):
C code
int fptr_call( int (*ptr)(int) ) {
return (*ptr)( 3 );
}
Assembly
_fptr_call:
0000000100000e70 pushq %rbp
0000000100000e71 movq %rsp, %rbp
0000000100000e74 subq $0x10, %rsp
0000000100000e78 movl $0x3, %eax
0000000100000e7d movq %rdi, 0xfffffffffffffff8(%rbp)
0000000100000e81 movl %eax, %edi
; Call based on %rbp, copied from %rdi which is ptr
0000000100000e83 callq *0xfffffffffffffff8(%rbp)
0000000100000e86 addq $0x10, %rsp
0000000100000e8a popq %rbp
0000000100000e8b ret
0000000100000e8c nopl (%rax)
Upvotes: 4