BigSum
BigSum

Reputation: 61

Dumped i386 assembly code and recompile as PPC?

I used the Apple built-in "otool" command with "-Vvtd" switches to dump a Mach-O i386 binary, redirected to a .s file. I have tried unsuccessfully to use nasm and GAS assemblers to recompile the code on a PPC machine ("as"-binary in the i386 directory of gcc/darwin and "as"-binary in the ppc directory as well). The output reads something like:

some_topmost_label:
(__TEXT,__text) section
_default_pager:
00112000    pushl   %ebp
00112001    movl    %esp,%ebp
00112003    pushl   %edi
00112004    pushl   %esi
00112005    pushl   %ebx
00112006    subl    $0x3c,%esp
00112009    movl    _default_pager_internal_count,%ebx
0011200f    addl    _default_pager_external_count,%ebx
00112015    leal    0x00000004(,%ebx,4),%ebx

There is a data section as well, going like:

...

(__DATA,__data) section
00421000    02 00 00 00 04 00 00 00 00 40 00 00 28 64 65 66

...

00449bc0    50 00 3d 00 00 00 00 00 00 00 00 00 00 00 00 00 
00449bd0    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 

...

I am intent on running the binary in Mac on PPC, hence the recompiling effort; I have tried removing the addresses in the leftmost column to make the syntax more "AT&T"-style, leaving them, etc. I DO NOT want to make any edits to the existing code structure (this is not exactly a reverse-engineering effort, per se, just some customization). However, if I have to do any editing, I would very much like it to be strictly for making the existing, unadulterated code for i386 run as is on PPC.

I will very much appreciate your help.

Regards

Upvotes: 0

Views: 330

Answers (2)

BigSum
BigSum

Reputation: 61

Decompilers can produce C files (as I have tried) which can be used to compile from source on a different architecture (which I have also tried). The experience was dicey at best. I'm still working on it and will likely still be for some time.

As an alternative, emulation can be implemented to run a binary/executable for i386 on ppc. This is a quick, but potentially less effective, route.

In addition, I feel it confirmed that assembly-to-assembly would be the most painful route as opposed to using the C programming language as an intermediate (by decompiling the i386 binary to C and recompiling the C on the target architecture).

In the case of decompiling: what if it produces a quarter-million lines of code? You may need a team :)

Upvotes: 1

user2404501
user2404501

Reputation:

In assembly language, every "statement" is an instruction that the processor can execute. The instructions are represented in a human-readable text format (if you're the right kind of human) but still, every instruction name (e.g. movl) and register (e.g. %esp) and memory reference (e.g. 0x00000004(,%ebx,4)) that exists in assembly directly corresponds to an implementation detail of the processor.

So every processor really has its own assembly language. Dumping and re-assembling doesn't get you anywhere. Not even within a set of related processors - if you take some 32-bit x86 code that was compiled SSE3 optimizations enabled and dump it, you'll have assembly code with SSE3 instructions. Re-assembling it won't get you a program that can run on a slightly older x86-32 processor.

There might be a chance, if your executable is old enough, that it's a "fat binary". During the period when PPC and x86 Macs were both supported by Apple, they would pack the compiled PPC and x86 code together in a single file. Judging by this answer you can detect fat binaries with the file command.

But chances are you have to do a lot more work than you were expecting.

PPC doesn't have a movl instruction (or any other kind of mov - it handles loads and stores separately). It doesn't have a dedicated stack register like %esp, although r1 is a safe bet. It doesn't have anything like the addressing mode in 0x00000004(,%ebx,4) - that's a register being multiplied by 4 and then adding the constant 4 - in PPC you'll have to load the constant into a different register with one instruction, then shift (*4 = <<2) the register in another instruction, then add those intermediate results together in a third instruction. This is not a matter of whether the instructions are written in "source form" or "binary form". It's a matter of the instructions in the original code not existing at all on the PPC.

Upvotes: 4

Related Questions