Asa Hunt
Asa Hunt

Reputation: 129

How can I obtain a PE file's instructions using Python?

So I'm trying to write a basic disassembler for a school project using Python. I'm using the pydasm and capstone libraries. What I don't understand is how I can actually access the assembly instructions of a program using Python. These libraries allow me to disassemble instructions, but I can't figure out how to access a program's instructions in Python. Could anyone give me some direction?

Thanks.

Upvotes: 2

Views: 9005

Answers (3)

user9594728
user9594728

Reputation:

so here is my code that disassembles your exe file and gives you an output in x86 assembly language I used pefile and capstone library

#!/usr/bin/python
import pefile
from capstone import *

# load the target PE file
pe = pefile.PE("/file/path/code.exe")

# get the address of the program entry point from the program header
entrypoint = pe.OPTIONAL_HEADER.AddressOfEntryPoint

# compute memory address where the entry code will be loaded into memory
entrypoint_address = entrypoint+pe.OPTIONAL_HEADER.ImageBase

# get the binary code from the PE file object
binary_code = pe.get_memory_mapped_image()[entrypoint:entrypoint+100]

# initialize disassembler to disassemble 32 bit x86 binary code
disassembler = Cs(CS_ARCH_X86, CS_MODE_32)

# disassemble the code
for instruction in disassembler.disasm(binary_code, entrypoint_address):
    print("%s\t%s" %(instruction.mnemonic, instruction.op_str))

make sure to change and give the correct path of your exe file, also you need to specify how many instructions you want to print out at line number 15. here [entrypoint:entrypoint+100] I specify only 100 instructions but you can change it.

here is the output of the code

Upvotes: 0

Neitsa
Neitsa

Reputation: 8176

You should be cautious about the code section as the base of the code section might not contain only code (imports or read only data might be present at this location).

The best way to start a disassembly is by looking at the AddressOfEntryPoint field in the IMAGE_OPTIONAL_HEADER which indicates the first executed byte in the PE file (except if TLS is present but that's another subject).

A very good library for browsing PE files in python is pefile.

Here's an example to get the first 10 bytes at the program entry point:

#!/usr/local/bin/python2
# -*- coding: utf8 -*-
from __future__ import print_function
import sys
import os.path
import pefile


def find_entry_point_section(pe, eop_rva):
    for section in pe.sections:
        if section.contains_rva(eop_rva):
            return section

    return None


def main(file_path):
    print("Opening {}".format(file_path))

    try:
        pe = pefile.PE(file_path, fast_load=True)
        # AddressOfEntryPoint if guaranteed to be the first byte executed.
        eop = pe.OPTIONAL_HEADER.AddressOfEntryPoint
        code_section = find_entry_point_section(pe, eop)
        if not code_section:
            return

        print("[+] Code section found at offset: "
              "{:#x} [size: {:#x}]".format(code_section.PointerToRawData,
                                          code_section.SizeOfRawData))

        # get first 10 bytes at entry point and dump them
        code_at_oep = code_section.get_data(eop, 10)
        print("[*] Code at EOP:\n{}".
              format(" ".join("{:02x}".format(ord(c)) for c in code_at_oep)))

    except pefile.PEFormatError as pe_err:
        print("[-] error while parsing file {}:\n\t{}".format(file_path,
                                                              pe_err))

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("[*] {} <PE_Filename>".format(sys.argv[0]))
    else:
        file_path = sys.argv[1]
        if os.path.isfile(file_path):
            main(file_path)
        else:
            print("[-] {} is not a file".format(file_path))

Simply pass the name of your PE file as the first argument.

In the above code the code_at_oep variable holds the first few bytes of the entry point. From there you can pass this bytes to the capstone engine.

Note that these first bytes might simply be a jmp or call instruction, so you'll have to follow the code execution in order to disassemble correctly. Disassembling correctly a program is still an open problem in computer science...

Upvotes: 5

Jesse Rusak
Jesse Rusak

Reputation: 57188

This depends on what OS you're using. You have some other questions about Linux, so I'm assuming you're using that. On Linux, executables are typically in ELF format, so you'll need a python library to read that or else to use some other tool to extract the part of the ELF file that you want.

The actual instructions are stored in the .text section, so if you extract that section's contents, those should be the raw bytes you want to disassemble.

Upvotes: 1

Related Questions