Reputation: 509
When disassembling an old .com executable file compiled from a code like this:
.model tiny ; com program
.code ; code segment
org 100h ; code starts at offset 100h
main proc near
mov ah,09h ; function to display a string
mov dx,offset message ; offset ofMessage string terminating with $
int 21h ; dos interrupt
mov ah,4ch ; function to terminate
mov al,00
int 21h ; Dos Interrupt
endp
message db "Hello World $" ; Message to be displayed terminating with a $
end main
in hex it looks like this:
B4 09 BA 0D 01 CD 21 B4 4C B0 00 CD 21 48 65 6C 6C 6F 20 57 6F 72 6C 64 20 24
how the disassembler knows where the code ends and the string "Hello world" starts?
Upvotes: 0
Views: 320
Reputation: 10570
Disassembler does not know where the code ends and where the data starts in a .com
file, because in .com
files there is no such distinction. In .com
files everything is loaded into the same segment and as DOS runs in real mode and does not have any kind of memory protection at all, you can for example write obfuscated code that looks like regular text and jump into it in your code. For example (possibly crashes DOS, haven't tested):
_start: jmp hello
hello:
db "Hello World!"
ret
So db "Hello World $"
is perfectly valid 16-bit code (checked with udcli
disassembler that comes with udis86 disassembler library for x86 and x86-64 in Linux:
$ echo `echo 'Hello World $' | tr -d "\n" | od -An -t xC` | udcli -x -16
0000000000000000 48 dec ax ; H
0000000000000001 656c insb ; el
0000000000000003 6c insb ; l
0000000000000004 6f outsw ; o
0000000000000005 20576f and [bx+0x6f], dl ; <space>Wo
0000000000000008 726c jb 0x76 ; rl
000000000000000a 642024 and [fs:si], ah ; d<space>$
However, db 0x64 0x20 0x24
is not valid 32-bit or 64-bit code.
This is 32-bit disassembly of db "Hello World! $"
:
$ echo `echo 'Hello World $' | tr -d "\n" | od -An -t xC` | udcli -x -32
0000000000000000 48 dec eax ; H
0000000000000001 656c insb ; el
0000000000000003 6c insb ; l
0000000000000004 6f outsd ; o
0000000000000005 20576f and [edi+0x6f], dl ; <space>Wo
0000000000000008 726c jb 0x76 ; rl
000000000000000a 642024 invalid ; d<space>$
What a disassembler can do is to use some heuristics and code tracing to decide whether to print some parts of the disassembly as code and some other parts as data. But a disassembler can never know where code ends and where data begins, because in .com
files such distinction exists only in the programmer's head and possibly in source code and in assembler's limitations, but not in the binary .com
file format itself.
Upvotes: 1