user2244242
user2244242

Reputation: 11

DOS assembly separate and print the command line arguments

I am trying to write a simple program in x86 (using MASM to compile). It's purpose is to write command line arguments to output (each in a new line). Here's what I came up with so far:

data1 segment
   input db 40 dup (?)   ;input                   
data1 ends


code1   segment

START:                           
   mov ax,seg input
   mov ds,ax
   mov dx,offset input
   mov di, dx 

   mov si, 82h
   mov cl,es:[80h]    


word:   
      mov al,es:[si]
      mov ds:[di],al   
      inc si   
      inc di   

      cmp al,0Dh   ;out of arguments? (if YES goto finish)
      jz finish

      cmp al,20h   ;end of word? (if NO goto word)
      jnz word

   mov al, '$'  ;line terminate
   mov ds:[di], al

   mov ah,09h      ;write string
   int 21h 

   xor di,di    ;prepare registry for new word

   call new_line


   loop word

finish: 


   mov al, '$'
   mov ds:[di], al

   mov ah,09h      ;write last argument
   int 21h  


   mov ax,4ch   ;end program
   int 21h


new_line:
   push ax
   push bp
   mov ax,0e0ah ;ah=0e-write char,al=0a-go to new line
   int 10h
   mov al,13     ;carriage return
   int 10h 
   pop bp
   pop ax
ret

code1 ends  
end START

It seems to work fine when tested under emu8086 but after compiling with MASM it gives the correct results only in 10% of executions. Any help would be much appreciated

Upvotes: 0

Views: 3965

Answers (3)

Sep Roland
Sep Roland

Reputation: 39166

Here's a review for your program that tries 'to write command line arguments to output (each in a new line)'.

Review

input db 40 dup (?)

A single command line argument could easily overflow this 40-byte buffer. The DOS command line has 128 bytes, so provide a buffer of that size.

mov si, 82h

You seem to expect the byte at offset 81h to always contain a space character. This is often the case, but it could also be a switchchar like "/". Then your program would be reporting an incomplete argument.

mov cl,es:[80h]

The loop word instruction that you use further down depends on the whole 16-bit CX register. You must zero the CH part of that register.

mov al,es:[si]
mov ds:[di],al

Because you store the byte even before checking it out, many of your printed arguments will end with an unwanted space character.

cmp al,0Dh   ;out of arguments? (if YES goto finish)
jz finish

Because you store the bytes even before checking them out, your last printed argument will end with a carriage return. That will not result in the same nice newline that you had for the other arguments. Newline mandatorily needs the carriage return (13) and the linefeed (10) codes.

cmp al,20h   ;end of word? (if NO goto word)
jnz word

Since your loop is based on a count in CX, you must first decrement that count before you can continue at the label word.

xor di,di    ;prepare registry for new word

It is more of a coincidence that your input buffer begins at offset zero in the data1 section. Better write this as mov di, OFFSET input.

call new_line

This really becomes redundant if you would insert the newline codes right before adding the $-terminator.

mov ax,4ch   ;end program

This loads 4Ch in the AL register. The function number belongs to the AH register. You have to write this as mov ax, 4C00h.

new_line:
 ...

Why does this routine preserve the BP register that it doesn't use?
And why rely on the BIOS.Teletype function 0Eh instead of the normally expected DOS.PrintChar function 02h?

Theory

When DOS launches your program, both DS and ES point at the 256-bytes Program Segment Prefix (PSP). The program's command line occupies the high half of this memory page. At PSP-offset 128 there's a byte that reports the length of the string that follows. If space permits it, then the string gets a carriage return terminator which is not included in the reported count.

The separation between the individual arguments on the command line is not necessarily a single space character! It could be a number of space characters in succession or even one or more tab characters. What separates arguments is called "whitespace".

An individual argument on the command line is not always just a "word" (in the usual sense). It could also be that it is a command switch introduced by a so-called switchchar. DOS has function 3700h for retrieving what the switchchar currently is. Where a switchchar was used, interjected whitespace is optional, so it need not be present.

Practice

My solution first copies the command tail to a normal program buffer that has enough room to always allow for a suitable string terminator. This avoids having to keep updating the count info, but it also makes it a bit easier by not having to deal with different segments all the time.

; FASM/DOSBox version (81 bytes)
; ------------------------------
  ORG   256          ; .COM executable

  mov   di, Buffer   ; Both DS and ES point at the 256-bytes PSP
  mov   si, 0081h    ; Offset to command line string in the PSP
  xor   cx, cx
  mov   cl, [si-1]   ; Length not including the terminating 13
  rep movsb
  mov   [di], cl     ; CL=0, Make our copy zero-terminated

  mov   ax, 3700h    ; DOS.GetSwitchchar
  int   21h          ; -> DL
  mov   bl, dl

  mov   si, Buffer
NextArg:
  mov   di, OneArg
SkipSpc:             ; Skip whitespace
  lodsb
  cmp   al, 9
  je    SkipSpc
  cmp   al, " "
  je    SkipSpc
  cmp   al, 0
  je    Done
NextChar:
  stosb              ; Store one character of current argument
  lodsb              ; (*)
  cmp   al, 9        ; Argument ends on whitespace
  je    Show
  cmp   al, " "
  je    Show
  cmp   al, bl       ; Argument ends on switchchar
  je    Show
  cmp   al, 0
  jne   NextChar
Show:
  dec   si           ; (*)
  mov   ax, 0A0Dh    ; Newline is 13, 10
  stosw
  mov   byte [di], "$"
  mov   dx, OneArg
  mov   ah, 09h      ; DOS.PrintString
  int   21h          ; -> AL = "$"
  jmp   NextArg

Done:
  mov   ax, 4C00h    ; DOS.TerminateWithReturncode
  int   21h
; -------------------
Buffer: db 256 dup ?
OneArg: db 256 dup ?

Although I don't use it myself, next is what it would look like for an emu8086 .EXE executable:

; MASM/emu8086 version
; --------------------
data1 segment

 Buffer db 256 dup (?)
 OneArg db 256 dup (?)

data1 ends
; --------------------
code1 segment

START:
  mov   ax, SEG Buffer ; Keep DS pointing at the 256-bytes PSP
  mov   es, ax       ; Only ES points at data1 segment
  mov   di, OFFSET Buffer
  mov   si, 0081h    ; Offset to command line string in the PSP
  xor   cx, cx
  mov   cl, [si-1]   ; Length not including the terminating 13
  rep movsb
  push  es
  pop   ds           ; Now DS too points at data1 segment
  mov   [di], cl     ; CL=0, Make our copy zero-terminated

  mov   ax, 3700h    ; DOS.GetSwitchchar
  int   21h          ; -> DL
  mov   bl, dl

  mov   si, OFFSET Buffer
NextArg:
  mov   di, OFFSET OneArg
SkipSpc:             ; Skip whitespace
  lodsb
  cmp   al, 9
  je    SkipSpc
  cmp   al, " "
  je    SkipSpc
  cmp   al, 0
  je    Done
NextChar:
  stosb              ; Store one character of current argument
  lodsb              ; (*)
  cmp   al, 9        ; Argument ends on whitespace
  je    Show
  cmp   al, " "
  je    Show
  cmp   al, bl       ; Argument ends on switchchar
  je    Show
  cmp   al, 0
  jne   NextChar
Show:
  dec   si           ; (*)
  mov   ax, 0D0Ah    ; Newline is 10, 13
  stosw
  mov   byte ptr [di], "$"
  mov   dx, OFFSET OneArg
  mov   ah, 09h      ; DOS.PrintString
  int   21h          ; -> AL = "$"
  jmp   NextArg

Done:
  mov   ax, 4C00h    ; DOS.TerminateWithReturncode
  int   21h

code1 ends
end START

Upvotes: 2

Magoo
Magoo

Reputation: 79982

It's been so long since I looked at any assembler...big hints rather than here's your answer

Are you sure ES is loaded with the appropriate segment, since you aren't initialising it?

Note that by loading CL with the contents of 80H, you sre setting CL to the LENGTH of the command line.

When you loop back to WORD (not a good name for a label, btw - since it's a keyword) you are transferring the next byte. All very well and good - but you are NOT decrementing CL, the count of characters in the command line. You should be jumping to the LOOP instruction which decrements CX and returns to the target label if 0 is not reached.

You've very carefully (and correctly) saved BP and AX before executing the INT 10h Is saving these two registers sufficient? Perhaps other registers are modified also...

Similarly, the INT 21H - are there any registers that may be changed by the execution of the routine behind this interrupt? If so, you should PUSH them first and POP them back after the routine finishes.

Be very caeful about relying on the CR=0DH=13 to end the line. This will be missing if the available space for arguments is COMPLETELY filled. The character count in CL is more important. Provided you correctly decrement CL by using the LOOP instruction, you won'tencounter the CR (IIRC) as it doesn't form part of the count. That assumes, of course, that CX is not changed by all of the folderol checking for a space or writing out the line...

Oh, btw - conventionally, a new line is CR,LF or 0DH,0AH - in that order. On mechanical terminals, this was quite literally moving the printhead back to the left-hand side, then scrolling the paper up by a line. The printheads were quite solid and gathered a large amount of momentum when they were returned against a spring-loaded stop. The consequence was that they'd often bounce and the beginning characters of the next line would be sprayed over the first few columns on the printout as the printhead settled, each new line inexorably jarring the mechanics more and more out of adjustment. In fact, it was not unusual to have a newline be CR LF CR, just to allow the mechanics time to settle.

Upvotes: 1

Michael
Michael

Reputation: 58427

Before you start copying the argument strings you do:

mov dx,offset input
mov di, dx

But if there are more than one argument you do this after the first argument has been printed:

xor di,di    ;prepare registry for new word

That should probably have been mov di, dx unless you're absolutely, positively, 100% certain that the offset of input always will be 0.

Upvotes: 0

Related Questions