Mahek Shamsukha
Mahek Shamsukha

Reputation: 140

What are the details of .com file format?

I have been given an assignment to make a 2 pass assembler for 8086. I wish to keep things simple and only assemble small programs for now. I found that the .COM format is very simple. However I cannot find the specifics of the file format.

Also I read that execution always begins at 100h. So won't it be a problem if MS-DOS(actually DOSBOX in my case) has system programs already present there? And Do I need to provide some default stub code in the 0h-100h part?

I simply want to know how will I write a .COM file that is runnable on DOSBOX.

Upvotes: 2

Views: 1598

Answers (1)

fuz
fuz

Reputation: 92966

The .COM format has no structure, it's a flat binary.

The program (the whole file) is loaded to address 100h in some segment. Below that, you'll find the PSP for your program. The last usable word in the segment (usually at offset fffeh) will be overwritten with 0000h and the stack pointer pointed to it. This allows you to exit the program with a ret instruction.

DOS's program-loader sets all of CS, DS, ES, and SS to the segment of your program. Then, the DOS kernel jumps to address 0100h (i.e. the start of your program) to run it. (Technically, the program loader doesn't set cs until it does a far jmp or iret to the cs:100h; if it had set CS earlier, any IP value would be inside the new program's memory, not the DOS kernel.)

That's really all there is to it. Your program doesn't have to care about segmentation at all, as long as the flat 64K of the "tiny" memory model is sufficient for all your static code+data loaded from the file, stack at the top, and any memory in between as BSS or "heap". Any segment base works the same, so for example [bx] and [bp] address the same linear address even though bp implies ss: and bx implies ds:.

Note that because the DOS kernel picks a segment for your program, it won't collide with any already loaded programs or the DOS kernel. It'll just work as expected.

As for writing COM programs, I recommend using an assembler like NASM with output format “binary” (i.e. no output format). The general template is this:

        org     100h            # Tell NASM that the binary is loaded to 100h

start:  ...                     # the program starts here.  This must
                                # be the first thing in the file.

        # place any variables or constants after the code

Then assemble with

nasm -f binary -o program.com program.asm

For more information, this resource might be helpful to you.

Upvotes: 6

Related Questions