Rohit Poduri
Rohit Poduri

Reputation: 109

PIN get assembly opcodes from instruction address

I am using PIN to analyze a C program's instructions and perform necessary operations. I have compiled my C program using GCC on Ubuntu and then passed the generated executable as input to the pintool. I have a pintool which calls an instruction instrumentation routine and then calls an analysis routine everytime. This is my Pintool in C++ -

#include "pin.H"
#include <fstream>
#include <cstdint>

UINT64 icount = 0;

using namespace std;

KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool", "o", "test.out","A pin tool");

FILE * trace;

//====================================================================
// Analysis Routines
//====================================================================

VOID dump(VOID *ip, UINT32 size) { 
    unsigned int i;
    UINT8 opcodeBytes[15];

    UINT32 fetched = PIN_SafeCopy(&opcodeBytes[0], ip, size);

    if (fetched != size) {
        fprintf(trace, "*** error fetching instruction at address 0x%lx",(unsigned long)ip);
        return;
    }

    fprintf(trace, "\n");
    fprintf(trace, "\n%d\n",size);

    for (i=0; i<size; i++)
        fprintf(trace, " %02x", opcodeBytes[i]); //print the opcode bytes
    fflush(trace);
}

//====================================================================
// Instrumentation Routines
//====================================================================

VOID Instruction(INS ins, void *v) {
      INS_InsertCall( ins, IPOINT_BEFORE, (AFUNPTR)dump, IARG_INST_PTR, IARG_UINT32, INS_Size(ins) , IARG_END);
}

VOID Fini(INT32 code, VOID *v) {
    printf("count = %ld\n",(long)icount);
}

INT32 Usage(VOID) {
    PIN_ERROR("This Pintool failed\n"
          + KNOB_BASE::StringKnobSummary() + "\n");
    return -1;
}

int main(int argc, char *argv[])
{
    trace = fopen("test.out", "w");

    if (PIN_Init(argc, argv)) return Usage();

    PIN_InitSymbols();
    PIN_AddInternalExceptionHandler(ExceptionHandler,NULL);
    INS_AddInstrumentFunction(Instruction, 0);
    PIN_AddFiniFunction(Fini, 0);

    // Never returns
    PIN_StartProgram();

    return 0;
}

When I check my output trace I see that I get an output like this-

3
 48 89 e7

5
 e8 78 0d 00 00

1
 55

The first row is the size in bytes of the instruction and the second row is the opcode stored in each byte.

I saw this particular forum- https://groups.yahoo.com/neo/groups/pinheads/conversations/topics/4405#

where they mentioned that the Linux output is inconsistent and is due to a 32 bit disassembler for 64 bit instructions. I am getting the same output as the Linux ones mentioned here, while the Windows ones are the correct x86_64 opcodes I am expecting.

Any idea how I can get the correct opcodes and if I am doing the dissassembly wrong, how I can correct it. I am using a 64-bit PC so don't know if I am doing 32-bit disassembly.

Upvotes: 1

Views: 2253

Answers (2)

nitzanms
nitzanms

Reputation: 1718

Pin has an API for disassembly, you should use it. See this question as to how it should be done:

https://reverseengineering.stackexchange.com/questions/12404/intel-pin-how-to-access-the-ins-object-from-inside-an-analysis-function

Upvotes: 1

Peter Cordes
Peter Cordes

Reputation: 364418

In 32-bit mode, 48 is a 1 byte inc or dec (I forget which).

In 64-bit mode, it's a REX prefix (with W=1, other bits unset, selecting 64-bit operand-size). (AMD 64 repurposed the whole 0x40-f range of inc/dec short encodings as REX prefixes.)

Decoding 48 89 e7 as a 3-byte instruction instead of a 48 and 89 e7 is absolute proof that it's disassembling in 64-bit mode.

So how am I supposed to interpret the instruction here?

As x86-64 instructions, obviously.

For your case, I fed those hex bytes to a disassembler:

db 0x48, 0x89, 0xe7
db 0xe8, 0x78, 0x0d, 0x00, 0x00
db 0x55

nasm -f elf64 foo.asm && objdump -drwC -Mintel foo.o

  400080:       48 89 e7                mov    rdi,rsp
  400083:       e8 78 0d 00 00          call rel32
  400088:       55                      push   rbp

objdump -d finds the same instruction breaks, because PIN was decoding it correctly.

The push is presumably at the start of the called function. Sticking them together sort of flattens the trace, and isn't a way to make a runnable version, just to get the bytes disassembled.

I should simple ignore the first byte and then use the remaining?

No, of course not. REX prefixes are part of the instruction. Without the 0x48, the first instruction would decode as mov edi,esp, which is a different instruction.

Try looking at some disassembly output for some existing code to get used to what x86-64 instructions look like. For specific encoding details, see Intel's vol.2 manual. It has some intro and appendix sections about instruction-encoding details. (The main body of the manual is the instruction-set reference, with the details of how every instruction works and its opcodes.) See https://software.intel.com/en-us/articles/intel-sdm#three-volume, and other links in the tag wiki.

Upvotes: 1

Related Questions