Reputation: 143
I have been trying to understand the object code and the exe file generated by the MASM assembler , but some parts are still blurry for me, I hope someone can really help me in understanding the same.
So i have a very simple MASM program
Q1.ASM
.model small
.stack 100h
.data
string db 'hello$'
.code
MAIN PROC
mov ax, @data
mov ds, ax
lea dx , string
mov ah, 9
int 21h
mov ah, 4ch
int 21h
MAIN ENDP
END MAIN
I ran it on dosbox with MASM Q1.ASM
and it generated Q1.OBJ
$ xxd Q1.OBJ
00000000: 8008 0006 5131 2e41 534d e196 2500 0006 ....Q1.ASM..%...
00000010: 4447 524f 5550 0444 4154 4104 434f 4445 DGROUP.DATA.CODE
00000020: 0553 5441 434b 055f 4441 5441 055f 5445 .STACK._DATA._TE
00000030: 5854 8f98 0700 4811 0007 0401 fc98 0700 XT....H.........
00000040: 4806 0006 0301 0998 0700 7400 0105 0501 H.........t.....
00000050: e19a 0600 02ff 02ff 035b 8804 0000 a200 .........[......
00000060: d1a0 0a00 0200 0068 656c 6c6f 241c a015 .......hello$...
00000070: 0001 0000 b800 008e d88d 1600 00b4 09cd ................
00000080: 21b4 4ccd 21f0 9c0b 00c8 0115 0101 c407 !.L.!...........
00000090: 1401 0297 8a07 00c1 0001 0100 00ac ..............
Then i ran $ link Q1.OBJ
and then it generated Q1.EXE
.
$ xxd Q1.EXE
00000000: 4d5a 1800 0200 0100 2000 1100 ffff 0200 MZ...... .......
00000010: 0001 c58b 0000 0000 1e00 0000 0100 0100 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000100: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000110: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000200: b801 008e d88d 1602 00b4 09cd 21b4 4ccd ............!.L.
00000210: 2100 6865 6c6c 6f24 !.hello$
Now i have two questions,
The object code generated should have modification records and relocation bits in it, but all are in binary, is there any way to properly analyse the modification records generated from the .OBJ file.
The Q1.EXE
file generated, as you can see, has many blanks given by 0000, what exactly are the use of them and what is the significance of 'L' in the line 00000200:
.
Upvotes: 2
Views: 502
Reputation: 5775
Ad question 1: My favourite 16bit tools are Object Dumper ODU.EXE and Borland Turbo Dump.
Ad question 2: The L in character column at file-offset 0x0000020E is a part
of machine instruction mov ah, 4ch
. Don't bother with it.
The linked image of your program begins at file-offset 0x00000200 with the .code
segment, followed by one alignment byte 0x00 at file-offset 0x00000211, followed by .data
segment at file-offset 0x00000212.
The linker assumes that the executable image will be loaded at linear address 0, which never happens, that is why relocations exist.
MZ relocation table begins at file-offset 0x1E and has only one dword member 01000000 which should be interpreted as a 16:16 far pointer into the image. In this case it points to 0000:0001 which represents a word at file-offset 0x00000201 and happens to have the value 0x0001. It is the imm field of machine instruction mov ax, @data
, assembled as b80100
.
DOS loader allocates memory for the executable at the paragraph address, say, 0x4C00. This value must be added to the relocated word, thus the instruction mov ax,@data
will be in fact assembled as b8014c
, as we could see in debugger.
Upvotes: 0
Reputation: 17185
You can't expect to be able to gain much from looking at the binary output - it's intended to be understood by the CPU (and the operating system) not by the programmer.
I can answer the second question, though: The exe file's header is padded by the linker to 512 bytes (0x200 hex), so that the actual code begins at offset 0x200. The exe header begins with the magic signature "MZ". What follows is the code, the "L" being the ascii equivalent of one of the machine code instructions used. The last part is the data section, containing only the string here.
Upvotes: 2