Reputation: 23
I wanted to learn more on ptrace's functions with x86_64 binaries, disassembling instructions. The goal is to check if a byte is one of instructions prefixes.
I found some information in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (volume 2, chapter 2).
The section 2.1.1 INSTRUCTION PREFIXES
shows the following prefixes:
0x26
] ES segment override0x36
] SS segment override prefix0x2E
] CS segment override prefix or Branch not taken0x3E
] DS segment override prefix or Branch taken0x64
] FS segment override prefix0x65
] GS segment override prefix0x66
] Operand-size override prefix0x67
] Address-size override prefix0xF0
] LOCK prefix0xF2
] REPNE/REPNZ prefix or BND prefix0xF3
] REP or REPE/REPZ prefixVisually, this chart shows prefixes in yellow.
If I want to know if a byte is a prefix, I will try to be efficient and check if it is possible to perform binaries operations.
If I take 0x26
, 0x36
, 0x2E
and 0x3E
as a group. These numbers in base 2 (00100110
, 00110110
, 00101110
and 00111110
) show a common part: 001XX110
.
An and-binary operation of 11100111
(0xE7
) can found if my byte is in this group.
Great. Now, if I take a second group which contains 0x64
, 0x65
, 0x66
and 0x67
(01100100
, 01100101
, 01100110
, 01100111
), I found an other common part: 011001XX
.
Then, the and-binary operation of 11111100
(0xFC
) can found if the byte is in the second group.
The problem comes for remaining instruction prefixes (0xF0
, 0xF2
and 0xF3
): There is no common part. An and-operation of 11111100
(0xFC
) would let the byte 0xF1
.
One solution would be to check after if the byte isn't 0xF1
.
So, a possible implementation in C would be:
if ((byte & 0xE7) == 0x26) {
/* This `byte` is a ES, SS, CS or DS segment override prefix */
}
if ((byte & 0xFC) == 0x64) {
/* This `byte` is a FS, GS, Operand-size or address-size override prefix */
}
if ((byte & 0xFC) == 0xF0) {
if (byte != 0xF1) {
/* This `byte` is a LOCK, REPN(E/Z) or REP(_/E/Z) prefix */
}
}
Coming from Intel, I would except that this last group would be possible to check in only one operation.
Then, the final question is: Can I check in one operation if the byte is 0xF0, 0xF2 or 0xF3?
Upvotes: 2
Views: 395
Reputation: 37222
Then, the final question is: Can I check in one operation if the byte is 0xF0, 0xF2 or 0xF3?
The closest you can get to one instruction is something like:
;ecx = the byte
bt [table],ecx ;Is the byte F0, F2 or F3?
jc .isF0F2orF3 ; yes
However, sometimes a prefix isn't considered a prefix (e.g. pause
instruction, which is encoded like rep nop
for compatibility with old CPUs).
Also note that for a high speed disassembler the fastest approach is likely "jump table driven", where one register points to the table corresponding to the decoder's state and another register contains the next byte of the instruction, like:
;ebx = address of table corresponding to the decoder's current state
movzx eax,byte [esi] ;eax = next byte of the instruction
inc esi ;esi = address of byte after the next byte of this instruction
jmp [ebx+eax*4] ;Go to the code that figures out what to do
In this case, some of the pieces of code jumped to would set some flags without changing the current table (e.g. the entry for 0xF3 in the initial table would cause a jump to code that sets a "rep prefix was seen" flag), and some of the pieces of code jumped to would switch to a different table (e.g. the entry for 0x0F in the initial table would cause a jump to code that changes EBX
to point to a completely different table used for all instructions that begin with an 0x0F, ...
); and some of the pieces of code jumped to would display an instruction (and reset the state of the decoder).
For example; for pause
the code might be:
table0entryF3:
or dword [prefixes],REP
movzx eax,byte [esi] ;eax = next byte of the instruction
inc esi ;esi = address of byte after the next byte
jmp [ebx+eax*4]
table0entry90:
mov edx,instructionNameString_NOP
test dword [prefixes],REP ;Was it a PAUSE or NOP?
je doneInstruction_noOperands ; NOP, current name is right
and dword [prefixes],~REP ; PAUSE, pretend the REP prefix wasn't there
mov edx,instructionNameString_PAUSE ; and use the right name
jmp doneInstruction_noOperands
doneInstruction_noOperands:
call displayPrefixes
call displayInstructionName
mov dword [prefixes],0 ;Reset prefixes
mov ebx,table0 ;Switch current table back to the initial table
movzx eax,byte [esi] ;eax = first byte of next instruction
inc esi ;esi = address of byte after the next byte
jmp [ebx+eax*4]
Upvotes: 2