S1mple
S1mple

Reputation: 45

How to find all the reachable labels in assembly files?

I'm working on programming a tool which aimed to separate assembly codes into different sections and labels. I'm trying to add a recursive mode.

If i'd like to print codes of one specific label and codes in the label content symbols of other labels, recursive mode should print labels referred to at the same time. For Example:

.file sample.s
...
A:
    ...
    call B
    ...
B:
    ...
C:
    ...

For codes above, if i'd like to print codes in label A on recursive mode, codes in label A and B should be printed at the same time. To do this, i have to find all the label reference symbol for each line.

Some of instructions may be important like call, lea, jmp. But it's not easy to list all the conditions. Any ideas? Thanks for your help!

Upvotes: 2

Views: 138

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364068

So you want to print all code reachable from a given label, except by returning further up the call tree? (i.e. all other basic blocks of this function, all child functions, and tail-call siblings).

The normal / simplest way for execution to get from one label to another is so simply fall through. Like

   mov ecx, 123
looptop:           ; do {
   ...
   dec ecx
   jnz looptop     ; }while(--ecx)

Unless the last instruction before the next label is an unconditional jump (like jmp or ret, but not call which can eventually return), you should also be following execution into that next block. A ret should end processing, jmp could be followed if you want, jnz might fall through.

For conditional branches, you presumably need to follow both sides.

Trying to trace through indirect jumps after code loads a function-pointer into a register with a RIP-relative LEA or a MOV is probably too hard. Do you really want to be able to trace foo(callback_func, 123) and be able to print the code for foo and the code it might call at callback_func?

If the arg is passed in a register (like x86-64 calling conventions) and it doesn't store it to the stack and reload it, then it's fairly easy to match that up with a jmp rdi after seeing there have been no intervening writes to RDI in between. But if it is more complex, like a debug built storing RDI to the stack and reloading somewhere else, you basically need an x86-64 simulator to trace the values.

I think it might be better to not even attempt tracing through indirect jumps, rather than having something that sometimes works (simple cases), sometimes doesn't. So probably you should forget about lea, unless you're thinking about dumping data declarations for static data referenced with LEA or MOV.


Some int 0x80 or syscall are noreturn (e.g. _exit, or sigreturn), but most aren't. The behaviour depends on the RAX/EAX value (and on the OS). Usually EAX gets set pretty soon before a system call, so you might want to special case the noreturn ones, otherwise you'll fall through past an exit into other code that shouldn't necessarily execute.

Same applies for library function calls like call exit.

Upvotes: 1

Related Questions