Reputation: 9206
I have an application which creates .text
segment dumps of win32 processes. Then it divides the code on basic blocks. Basic block is a set of instructions which are executed always one after another (jumps are always the last instructions of such basic blocks). Here is an example:
Basic block 1
mov ecx, dword ptr [ecx]
test ecx, ecx
je 00401013h
Basic block 2
mov eax, dword ptr [ecx]
call dword ptr [eax+08h]
Basic block 3
test eax, eax
je 0040100Ah
Basic block 4
mov edx, dword ptr [eax]
push 00000001h
mov ecx, eax
call dword ptr [edx]
Basic block 5
ret 000008h
Now I would like to group such basic blocks in functions - say which basic blocks form a function. What's the algorithm? I have to remember that there might be many ret
instructions inside one function. How to detect fast_call
functions?
Upvotes: 5
Views: 289
Reputation: 62048
The simplest algorithm for grouping blocks into functions would be:
call some_address
instructionsret
, you're done with the function, elseret
. You'll need to recognize jumps that organize loops so your program itself does not hang by entering an infinite loopProblems:
call [some_address]
instead of call some_address
jump some_address
instead of call some_address
immediately followed by ret
call some_address
can be simulated with a combination of push some_address
+ ret
OR push some_address
+ jmp some_other_address
You may use some heuristic to determine where functions start by looking for the most common prolog instruction sequence:
push ebp
mov ebp, esp
Again, this may not work if functions are compiled with the frame pointer suppressed (i.e. they'd use esp
instead of ebp
to access their parameters on the stack, it's possible).
The compiler (e.g. MSVC++) may also pad the inter-function space with the int 3
instruction and that too can serve as a hint for an upcoming function beginning.
As for differentiating between the various calling conventions, it's perhaps the easiest to look at the symbols (of course, if you have them). MSVC++ generates different name prefixes and suffixes, e.g.:
If you cannot extract this information from the symbols, you must analyze code to see how parameters are passed to functions and whether functions or their callers remove them from the stack.
Upvotes: 6
Reputation:
Have a look at software like windasm or ollydbg. The call
and ret
operations denote function calls. However code does not run sequentially and jumps can be made all over the place. call dword ptr [edx]
depends on the edx register and thus you won't be able to know where it goes unless you do runtime debugging.
To recognize fastcall functions you have to look at how parameters are passed on. Fastcall will put the first two pointer sized parameters in edx and ecx registers, where stdcall will push them on the stack. See this article for an explanation.
Upvotes: 1
Reputation: 64068
You could use the presence of enter
to denote the beginning of a function, or certain code which sets up a frame.
push ebp
mov ebp, esp
sub esp, (bytes for "local" stack space)
Later you'll find the opposite code (or leave
) before a call to ret
:
mov esp, ebp
pop ebp
You can also use the number of bytes for local stack space to identify local variables.
Identifying thiscall
, fastcall
, etc, will take some analysis of the code just prior to call
s which use the initial location and an evaluation of the registers used/cleaned up.
Upvotes: 3