Reputation: 140
I have been given an assignment to make a 2 pass assembler for 8086
. I wish to keep things simple and only assemble small programs for now. I found that the .COM
format is very simple. However I cannot find the specifics of the file format.
Also I read that execution always begins at 100h
. So won't it be a problem if MS-DOS
(actually DOSBOX
in my case) has system programs already present there? And Do I need to provide some default stub code in the 0h
-100h
part?
I simply want to know how will I write a .COM
file that is runnable on DOSBOX
.
Upvotes: 2
Views: 1598
Reputation: 92966
The .COM
format has no structure, it's a flat binary.
The program (the whole file) is loaded to address 100h
in some segment. Below that, you'll find the PSP for your program. The last usable word in the segment (usually at offset fffeh
) will be overwritten with 0000h
and the stack pointer pointed to it. This allows you to exit the program with a ret
instruction.
DOS's program-loader sets all of CS
, DS
, ES
, and SS
to the segment of your program. Then, the DOS kernel jumps to address 0100h
(i.e. the start of your program) to run it. (Technically, the program loader doesn't set cs
until it does a far jmp
or iret
to the cs:100h
; if it had set CS
earlier, any IP
value would be inside the new program's memory, not the DOS kernel.)
That's really all there is to it. Your program doesn't have to care about segmentation at all, as long as the flat 64K of the "tiny" memory model is sufficient for all your static code+data loaded from the file, stack at the top, and any memory in between as BSS or "heap". Any segment base works the same, so for example [bx]
and [bp]
address the same linear address even though bp
implies ss:
and bx
implies ds:
.
Note that because the DOS kernel picks a segment for your program, it won't collide with any already loaded programs or the DOS kernel. It'll just work as expected.
As for writing COM programs, I recommend using an assembler like NASM with output format “binary” (i.e. no output format). The general template is this:
org 100h # Tell NASM that the binary is loaded to 100h
start: ... # the program starts here. This must
# be the first thing in the file.
# place any variables or constants after the code
Then assemble with
nasm -f binary -o program.com program.asm
For more information, this resource might be helpful to you.
Upvotes: 6