valki
valki

Reputation: 63

decompiling assembler to c

I'm faced with finding a bug in a 10,000+ lines assembly program, and maintaining it in the future. It concerns assembler code for a PIC18 microchip microcontroller. The code was written by a former colleague who is no longer available, and he left no documentation and his code is very poorly commented.

The code also uses a number of direct goto's and bra's, and relative ones ($+0x06 for instance) so I'm very affraid to edit it.

I would rather decompile his assembly source code into 'new' C-code (or C++) and work from there. I would like to know:

  1. Is decompiled assembler -> C code 100% functional?
  2. What will happen to the direct and relative goto's?
  3. Are there any free or cheap tools available to decompile PIC18 code?

Upvotes: 2

Views: 3802

Answers (3)

zxcdw
zxcdw

Reputation: 1649

While decompiling itself is possible to some extent, whether it's applicable in your situation is another thing which depends on the available software. Personally I would start deciphering the code manually and roughly reconstructing the code with C. Of course, it's a tedious task and probably involves lots of errors and bugs, but if you are supposed to maintain the codebase then it would be obvious to try to make that task as easy as possible later on - and perhaps the future maintainers would appreciate your job too.

Upvotes: 0

old_timer
old_timer

Reputation: 71566

The non-mips pics are extremely compiler unfriendly, using C is slow and costly in terms of consuming limit resources.

gotos and branches are how processors work, nothing wrong with them.

I would start by removing the relative branches and replacing them with labels, it should be a trivial check that you have added no new bugs by comparing the binary of the pre-modified code and post modified code. Once the relative branches all use labels you can then add or remove code with less fear of breaking something else.

Decompilers dont work and cant work. It would be like trying to take the french fries you had for lunch out of your intestinal track and trying to restore the original potato (without the skin of course, unless they were skin-on fries). The original source has been cooked and processed a number of times before it becomes the asm that is fed to the assembler, then it is cooked (well...warmed up) another time if you use objects and a linker. Too much material is lost on the way that cannot be recovered.

It is very possible to do a static binary translation, and get something that is in the C language but is significantly harder to read and maintain than the asm, esp for architectures with flags. The translator is not going to be perfect removing all of the dead code. Every add instruction is going to be the add itself plus code to detect the overflow and if overflow set the flag, else clear, if result equals zero set the flag, else clear if msbit is set set the n bit else clear, if there is a v bit then if signed overflow then set the v bit else clear, one simple line of asm becomes 10 lines of C code unless the translator can automatically remove that dead code.

Upvotes: 2

unwind
unwind

Reputation: 399989

Here's how I see it:

  1. No, I would expect that to be possible, since there's no guarantee that C can exactly express everything a particular CPU can do at the assembly level in a way that is straight-forward enough to make implementable in a decompiler. Sure they're both Turin complete, but the devil is certainly in the details, here.
  2. That's one particular issue, for instance. It's certainly possible in assembly to jump e.g. into the middle of what would be a C statement, which makes it complicated to reverse.
  3. I don't think so, no.

For more information about the problem space, check out the Boomerang project's FAQ.

Upvotes: 4

Related Questions