Reputation: 11
I am trying to write a simple program in x86 (using MASM to compile). It's purpose is to write command line arguments to output (each in a new line). Here's what I came up with so far:
data1 segment
input db 40 dup (?) ;input
data1 ends
code1 segment
START:
mov ax,seg input
mov ds,ax
mov dx,offset input
mov di, dx
mov si, 82h
mov cl,es:[80h]
word:
mov al,es:[si]
mov ds:[di],al
inc si
inc di
cmp al,0Dh ;out of arguments? (if YES goto finish)
jz finish
cmp al,20h ;end of word? (if NO goto word)
jnz word
mov al, '$' ;line terminate
mov ds:[di], al
mov ah,09h ;write string
int 21h
xor di,di ;prepare registry for new word
call new_line
loop word
finish:
mov al, '$'
mov ds:[di], al
mov ah,09h ;write last argument
int 21h
mov ax,4ch ;end program
int 21h
new_line:
push ax
push bp
mov ax,0e0ah ;ah=0e-write char,al=0a-go to new line
int 10h
mov al,13 ;carriage return
int 10h
pop bp
pop ax
ret
code1 ends
end START
It seems to work fine when tested under emu8086 but after compiling with MASM it gives the correct results only in 10% of executions. Any help would be much appreciated
Upvotes: 0
Views: 3965
Reputation: 39166
Here's a review for your program that tries 'to write command line arguments to output (each in a new line)'.
input db 40 dup (?)
A single command line argument could easily overflow this 40-byte buffer. The DOS command line has 128 bytes, so provide a buffer of that size.
mov si, 82h
You seem to expect the byte at offset 81h to always contain a space character. This is often the case, but it could also be a switchchar like "/". Then your program would be reporting an incomplete argument.
mov cl,es:[80h]
The loop word
instruction that you use further down depends on the whole 16-bit CX register. You must zero the CH part of that register.
mov al,es:[si] mov ds:[di],al
Because you store the byte even before checking it out, many of your printed arguments will end with an unwanted space character.
cmp al,0Dh ;out of arguments? (if YES goto finish) jz finish
Because you store the bytes even before checking them out, your last printed argument will end with a carriage return. That will not result in the same nice newline that you had for the other arguments. Newline mandatorily needs the carriage return (13) and the linefeed (10) codes.
cmp al,20h ;end of word? (if NO goto word) jnz word
Since your loop is based on a count in CX, you must first decrement that count before you can continue at the label word.
xor di,di ;prepare registry for new word
It is more of a coincidence that your input buffer begins at offset zero in the data1 section. Better write this as mov di, OFFSET input
.
call new_line
This really becomes redundant if you would insert the newline codes right before adding the $-terminator.
mov ax,4ch ;end program
This loads 4Ch in the AL register. The function number belongs to the AH register. You have to write this as mov ax, 4C00h
.
new_line: ...
Why does this routine preserve the BP register that it doesn't use?
And why rely on the BIOS.Teletype function 0Eh instead of the normally expected DOS.PrintChar function 02h?
When DOS launches your program, both DS and ES point at the 256-bytes Program Segment Prefix (PSP). The program's command line occupies the high half of this memory page. At PSP-offset 128 there's a byte that reports the length of the string that follows. If space permits it, then the string gets a carriage return terminator which is not included in the reported count.
The separation between the individual arguments on the command line is not necessarily a single space character! It could be a number of space characters in succession or even one or more tab characters. What separates arguments is called "whitespace".
An individual argument on the command line is not always just a "word" (in the usual sense). It could also be that it is a command switch introduced by a so-called switchchar. DOS has function 3700h for retrieving what the switchchar currently is. Where a switchchar was used, interjected whitespace is optional, so it need not be present.
My solution first copies the command tail to a normal program buffer that has enough room to always allow for a suitable string terminator. This avoids having to keep updating the count info, but it also makes it a bit easier by not having to deal with different segments all the time.
; FASM/DOSBox version (81 bytes)
; ------------------------------
ORG 256 ; .COM executable
mov di, Buffer ; Both DS and ES point at the 256-bytes PSP
mov si, 0081h ; Offset to command line string in the PSP
xor cx, cx
mov cl, [si-1] ; Length not including the terminating 13
rep movsb
mov [di], cl ; CL=0, Make our copy zero-terminated
mov ax, 3700h ; DOS.GetSwitchchar
int 21h ; -> DL
mov bl, dl
mov si, Buffer
NextArg:
mov di, OneArg
SkipSpc: ; Skip whitespace
lodsb
cmp al, 9
je SkipSpc
cmp al, " "
je SkipSpc
cmp al, 0
je Done
NextChar:
stosb ; Store one character of current argument
lodsb ; (*)
cmp al, 9 ; Argument ends on whitespace
je Show
cmp al, " "
je Show
cmp al, bl ; Argument ends on switchchar
je Show
cmp al, 0
jne NextChar
Show:
dec si ; (*)
mov ax, 0A0Dh ; Newline is 13, 10
stosw
mov byte [di], "$"
mov dx, OneArg
mov ah, 09h ; DOS.PrintString
int 21h ; -> AL = "$"
jmp NextArg
Done:
mov ax, 4C00h ; DOS.TerminateWithReturncode
int 21h
; -------------------
Buffer: db 256 dup ?
OneArg: db 256 dup ?
Although I don't use it myself, next is what it would look like for an emu8086 .EXE executable:
; MASM/emu8086 version
; --------------------
data1 segment
Buffer db 256 dup (?)
OneArg db 256 dup (?)
data1 ends
; --------------------
code1 segment
START:
mov ax, SEG Buffer ; Keep DS pointing at the 256-bytes PSP
mov es, ax ; Only ES points at data1 segment
mov di, OFFSET Buffer
mov si, 0081h ; Offset to command line string in the PSP
xor cx, cx
mov cl, [si-1] ; Length not including the terminating 13
rep movsb
push es
pop ds ; Now DS too points at data1 segment
mov [di], cl ; CL=0, Make our copy zero-terminated
mov ax, 3700h ; DOS.GetSwitchchar
int 21h ; -> DL
mov bl, dl
mov si, OFFSET Buffer
NextArg:
mov di, OFFSET OneArg
SkipSpc: ; Skip whitespace
lodsb
cmp al, 9
je SkipSpc
cmp al, " "
je SkipSpc
cmp al, 0
je Done
NextChar:
stosb ; Store one character of current argument
lodsb ; (*)
cmp al, 9 ; Argument ends on whitespace
je Show
cmp al, " "
je Show
cmp al, bl ; Argument ends on switchchar
je Show
cmp al, 0
jne NextChar
Show:
dec si ; (*)
mov ax, 0D0Ah ; Newline is 10, 13
stosw
mov byte ptr [di], "$"
mov dx, OFFSET OneArg
mov ah, 09h ; DOS.PrintString
int 21h ; -> AL = "$"
jmp NextArg
Done:
mov ax, 4C00h ; DOS.TerminateWithReturncode
int 21h
code1 ends
end START
Upvotes: 2
Reputation: 79982
It's been so long since I looked at any assembler...big hints rather than here's your answer
Are you sure ES is loaded with the appropriate segment, since you aren't initialising it?
Note that by loading CL with the contents of 80H, you sre setting CL to the LENGTH of the command line.
When you loop back to WORD (not a good name for a label, btw - since it's a keyword) you are transferring the next byte. All very well and good - but you are NOT decrementing CL, the count of characters in the command line. You should be jumping to the LOOP instruction which decrements CX and returns to the target label if 0 is not reached.
You've very carefully (and correctly) saved BP and AX before executing the INT 10h
Is saving these two registers sufficient? Perhaps other registers are modified also...
Similarly, the INT 21H
- are there any registers that may be changed by the execution of the routine behind this interrupt? If so, you should PUSH
them first and POP
them back after the routine finishes.
Be very caeful about relying on the CR=0DH=13 to end the line. This will be missing if the available space for arguments is COMPLETELY filled. The character count in CL is more important. Provided you correctly decrement CL by using the LOOP
instruction, you won'tencounter the CR (IIRC) as it doesn't form part of the count. That assumes, of course, that CX
is not changed by all of the folderol checking for a space or writing out the line...
Oh, btw - conventionally, a new line is CR
,LF
or 0DH,0AH - in that order. On mechanical terminals, this was quite literally moving the printhead back to the left-hand side, then scrolling the paper up by a line. The printheads were quite solid and gathered a large amount of momentum when they were returned against a spring-loaded stop. The consequence was that they'd often bounce and the beginning characters of the next line would be sprayed over the first few columns on the printout as the printhead settled, each new line inexorably jarring the mechanics more and more out of adjustment. In fact, it was not unusual to have a newline be CR LF CR, just to allow the mechanics time to settle.
Upvotes: 1
Reputation: 58427
Before you start copying the argument strings you do:
mov dx,offset input
mov di, dx
But if there are more than one argument you do this after the first argument has been printed:
xor di,di ;prepare registry for new word
That should probably have been mov di, dx
unless you're absolutely, positively, 100% certain that the offset of input
always will be 0.
Upvotes: 0