0x4E84
0x4E84

Reputation: 23

Intel prefixes instructions, checking optimisations problems

I wanted to learn more on ptrace's functions with x86_64 binaries, disassembling instructions. The goal is to check if a byte is one of instructions prefixes.

I found some information in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (volume 2, chapter 2).

The section 2.1.1 INSTRUCTION PREFIXES shows the following prefixes:

Visually, this chart shows prefixes in yellow.

If I want to know if a byte is a prefix, I will try to be efficient and check if it is possible to perform binaries operations.

If I take 0x26, 0x36, 0x2E and 0x3E as a group. These numbers in base 2 (00100110, 00110110, 00101110 and 00111110) show a common part: 001XX110.

An and-binary operation of 11100111 (0xE7) can found if my byte is in this group.

Great. Now, if I take a second group which contains 0x64, 0x65, 0x66 and 0x67 (01100100, 01100101, 01100110, 01100111), I found an other common part: 011001XX.

Then, the and-binary operation of 11111100 (0xFC) can found if the byte is in the second group.

The problem comes for remaining instruction prefixes (0xF0, 0xF2 and 0xF3): There is no common part. An and-operation of 11111100 (0xFC) would let the byte 0xF1.

One solution would be to check after if the byte isn't 0xF1.

So, a possible implementation in C would be:

if ((byte & 0xE7) == 0x26) {
    /* This `byte` is a ES, SS, CS or DS segment override prefix */
}
if ((byte & 0xFC) == 0x64) {
    /* This `byte` is a FS, GS, Operand-size or address-size override prefix */
}
if ((byte & 0xFC) == 0xF0) {
    if (byte != 0xF1) {
        /* This `byte` is a LOCK, REPN(E/Z) or REP(_/E/Z) prefix */
    }
}

Coming from Intel, I would except that this last group would be possible to check in only one operation.

Then, the final question is: Can I check in one operation if the byte is 0xF0, 0xF2 or 0xF3?

Upvotes: 2

Views: 395

Answers (1)

Brendan
Brendan

Reputation: 37222

Then, the final question is: Can I check in one operation if the byte is 0xF0, 0xF2 or 0xF3?

The closest you can get to one instruction is something like:

                     ;ecx = the byte
    bt [table],ecx   ;Is the byte F0, F2 or F3?
    jc .isF0F2orF3   ; yes

However, sometimes a prefix isn't considered a prefix (e.g. pause instruction, which is encoded like rep nop for compatibility with old CPUs).

Also note that for a high speed disassembler the fastest approach is likely "jump table driven", where one register points to the table corresponding to the decoder's state and another register contains the next byte of the instruction, like:

                          ;ebx = address of table corresponding to the decoder's current state
    movzx eax,byte [esi]  ;eax = next byte of the instruction
    inc esi               ;esi = address of byte after the next byte of this instruction
    jmp [ebx+eax*4]       ;Go to the code that figures out what to do

In this case, some of the pieces of code jumped to would set some flags without changing the current table (e.g. the entry for 0xF3 in the initial table would cause a jump to code that sets a "rep prefix was seen" flag), and some of the pieces of code jumped to would switch to a different table (e.g. the entry for 0x0F in the initial table would cause a jump to code that changes EBX to point to a completely different table used for all instructions that begin with an 0x0F, ...); and some of the pieces of code jumped to would display an instruction (and reset the state of the decoder).

For example; for pause the code might be:

table0entryF3:
    or dword [prefixes],REP
    movzx eax,byte [esi]                ;eax = next byte of the instruction
    inc esi                             ;esi = address of byte after the next byte
    jmp [ebx+eax*4]

table0entry90:
    mov edx,instructionNameString_NOP
    test dword [prefixes],REP           ;Was it a PAUSE or NOP?
    je doneInstruction_noOperands       ; NOP, current name is right
    and dword [prefixes],~REP           ; PAUSE, pretend the REP prefix wasn't there
    mov edx,instructionNameString_PAUSE ;        and use the right name
    jmp doneInstruction_noOperands

doneInstruction_noOperands:
    call displayPrefixes
    call displayInstructionName
    mov dword [prefixes],0              ;Reset prefixes
    mov ebx,table0                      ;Switch current table back to the initial table
    movzx eax,byte [esi]                ;eax = first byte of next instruction
    inc esi                             ;esi = address of byte after the next byte
    jmp [ebx+eax*4]

Upvotes: 2

Related Questions