nomad
nomad

Reputation: 91

Strange behaviour of Capstone disassembler when running a simple example

I played with Capstone disassembler and found strange behaviour.
I created a simple program, which takes notepad.exe (x86-64 PE), disassembles its .text section and prints the disassembly line by line. (Slightly modified version of https://stackoverflow.com/a/66140741).

Problem: It looks like the disassembly interrupts immediately after 0x1c00, starts from the beginning of .text section, and ends 0x400 bytes before it should.

Note 1: Capstone version: 5.0.1.
Note 2: The file is not corrupted.
Note 3: pefile loads the file correctly.
Note 4: Same behaviour when running the example on Windows.
Note 5: Same behaviour on other files.

Is this a bug or am I doing something wrong?

Code:

import pefile
from capstone import *

exe_file = '/home/user/TEST/notepad.exe'
pe = pefile.PE(exe_file)

# find .text section
offset = False
for section in pe.sections:
    if section.Name == b'.text\x00\x00\x00':
        offset = section.VirtualAddress
        code_ptr = section.PointerToRawData
        code_end_ptr = code_ptr + section.SizeOfRawData
        print("@@@ offset=0x{:0x} code_ptr=0x{:0x} code_end_ptr=0x{:0x}".format(offset, code_ptr, code_end_ptr))
        break

code = pe.get_memory_mapped_image()[code_ptr : code_end_ptr]

# start disassembling text section
md = Cs(CS_ARCH_X86, CS_MODE_64)
md.detail = True
if offset:
    for i in md.disasm(code, offset):
        print(i)
    print("end")

Output:

@@@ offset=0x1000 code_ptr=0x400 code_end_ptr=0x24a00
<CsInsn 0x1000 [cc]: int3 >
<CsInsn 0x1001 [cc]: int3 >
<CsInsn 0x1002 [cc]: int3 >
<CsInsn 0x1003 [cc]: int3 >
<CsInsn 0x1004 [cc]: int3 >
<CsInsn 0x1005 [cc]: int3 >
<CsInsn 0x1006 [cc]: int3 >
<CsInsn 0x1007 [cc]: int3 >
<CsInsn 0x1008 [4c8bdc]: mov r11, rsp>
<CsInsn 0x100b [4881ec88000000]: sub rsp, 0x88>
...
<CsInsn 0x1bf4 [e847fdffff]: call 0x1940>
<CsInsn 0x1bf9 [eb0c]: jmp 0x1c07>
<CsInsn 0x1bfb [4c8d05d659cccc]: lea r8, [rip - 0x3333a62a]>      <-- disassembly interrups immediately after 0x1c00,
<CsInsn 0x1c02 [cc]: int3 >                                       <-- starts from the beginning of .text,
<CsInsn 0x1c03 [cc]: int3 >
<CsInsn 0x1c04 [cc]: int3 >
<CsInsn 0x1c05 [cc]: int3 >
<CsInsn 0x1c06 [cc]: int3 >
<CsInsn 0x1c07 [cc]: int3 >
<CsInsn 0x1c08 [4c8bdc]: mov r11, rsp>
<CsInsn 0x1c0b [4881ec88000000]: sub rsp, 0x88>
...
<CsInsn 0x255ee [cc]: int3 >
<CsInsn 0x255ef [cc]: int3 >
<CsInsn 0x255f0 [4883790800]: cmp qword ptr [rcx + 8], 0>
<CsInsn 0x255f5 [488d05d4290000]: lea rax, [rip + 0x29d4]>        <-- and ends 0x400 bytes before it should
end

IDA Pro disassembly (for reference): enter image description here

Upvotes: 3

Views: 125

Answers (1)

Thanawat
Thanawat

Reputation: 21

"By default, Capstone stops disassembling when it encounters a broken instruction."

trying turn on SKIPDATA mode

import pefile
from capstone import *

exe_file = '/home/user/TEST/notepad.exe'
pe = pefile.PE(exe_file)

# find .text section
offset = False
for section in pe.sections:
    if section.Name == b'.text\x00\x00\x00':
        offset = section.VirtualAddress
        code_ptr = section.PointerToRawData
        code_end_ptr = code_ptr + section.SizeOfRawData
        print("@@@ offset=0x{:0x} code_ptr=0x{:0x} code_end_ptr=0x{:0x}".format(offset, code_ptr, code_end_ptr))
        break

code = pe.get_memory_mapped_image()[code_ptr : code_end_ptr]

# start disassembling text section
md = Cs(CS_ARCH_X86, CS_MODE_64)
md.detail = True

md.skipdata = True  # turn on skipdata mode

if offset:
    for i in md.disasm(code, offset):
        print(i)
    print("end")

Upvotes: 2

Related Questions