Reputation: 83
I would like to ask you how to determine in which ISA (ARM/Thumb/Thumb-2) an instruction is encoded?
First of all, I tried to do it following the instructions here (section 4.5.5).
However, when I use readelf -s ./arm_binary
, and arm_binary
was built in release mode, it appears that there is no .symtab
in the binary. And anyway, I don't understand how to use this command to find the type for the instructions.
Secondly, I know the other way to differentiate is to look at the PC address for the ARM/Thumb instruction. If it is even then it is a Thumb instruction, if not - then ARM. But how can I do this without loading the file to memory? When I parse the sections of the file and find the execute section, all that I have is the start (offset) location in the file and the file-offset is always even, and it will be always even because we have instruction of size equal to 2 or 4...
Finally, the last way to check is to detect BX Rm
, extract the value from Rm
, and then check if that address in Rm
is it even or not. But, this may be difficult because for this I would need to emulate the whole program.
So what is the correct way to identify the ISA for disassembly?
Thank you for your attention and I hope you will help me.
Upvotes: 0
Views: 1068
Reputation: 25579
I don't believe it's possible to tell, in a mixed mode binary, without inspecting the instructions as you describe.
If the whole file will be one ISA or the other, then you can determine the ISA of the entry point by running this:
readelf -h ./arm_binary
And checking whether the entry point is even or odd.
However, what I would do is simply disassemble it both ways, and see what looks right. As long as you start the disassembly at the start of a function (or any 4-byte boundary), then this will work fine. Most code will produce nonsense when disassembled in the wrong ISA.
Upvotes: 2