Assembly code for simple coding/decoding of string confusion?

Question

I am learning for my exam and I am so confused by this assembly code. It is a program in which first user enters a string, than that string gets coded and printed, than decoded and printed.

What confuses me is (De)Coding part. So, with "LEA bx, MyString" memory address of MyString is saved in register bx. Now the coding takes place. What is the purpose of this?

INC bx
MOV cl, [bx]
XOR ch, ch

coding:
    INC bx
    MOV dl, [bx]
    XOR dl, ah
    MOV [bx], dl
LOOP coding

Why increment memory address? Doesn't that change address? Why increment bx in loop again. These pointers just confuse me. I get the part where character from the address bx is moved to dl than coded with the mask than placed back to dl. I'm just so confused by this incrementing of the memory adress. Does that mean it starts from the 3rd character instead first? Than codes 3+ characters of the string with the mask? What's up with first two than? Sorry if questions are stupid, thanks!

Here is the full code:

.MODEL small
.DATA
    STR_LENGTH EQU 30
    BUFF_LENGTH EQU STR_LENGTH + 3
    MyString DB BUFF_LENGTH DUP (0)
    Coder_Mask DB 128
.STACK
.CODE

NewLine MACRO
    MOV dl, 10
    MOV ah, 02h
    INT 21h
    MOV dl, 13
    MOV ah, 02h
    INT 21h
ENDM    

DeCode MACRO bx, ah
LOCAL coding

    INC bx
    MOV cl, [bx]
    XOR ch, ch

    coding:
        INC bx
        MOV dl, [bx]
        XOR dl, ah
        MOV [bx], dl
    LOOP coding
ENDM

WriteString MACRO bx
LOCAL writing

    INC bx
    MOV cl, [bx]
    XOR ch, ch

    writing:
        INC bx
        MOV dl, [bx]
        MOV ah, 02h
        INT 21h
    LOOP writing
ENDM

Start:
    MOV ax, @DATA
    MOV ds, ax

    LEA bx, MyString
    MOV cl, BUFF_LENGTH
    MOV [bx], cl
    LEA dx, MyString
    MOV ah, 0Ah
    INT 21h

    NewLine
    LEA bx, MyString
    WriteString bx

    LEA bx, MyString
    MOV ah, Coder_Mask
    DeCode bx, ah

    NewLine
    LEA bx, MyString
    WriteString bx

    NewLine

    LEA bx, MyString
    MOV ah, Coder_Mask
    DeCode bx, ah

    NewLine
    LEA bx, MyString
    WriteString bx

    MOV ax, 4C00h
    INT 21h
END Start

Ped7g · Accepted Answer

You need to understand the structure of memory, how the string is stored.

The teacher's code is missing any comments, so it was either your task to figure it out (and you failed), or I will not comment any further about your teacher due to diplomacy reasons.

The structure of string buffer is the one used by MS-DOS for function 0Ah of int 21h (description):

MyString:
    db string_maximum_size     ; maximum characters to store into buffer
    db character_actually_read ; characters read by INT 21h: 0Ah function
    db string_maximum_size DUP (0)  ; the string characters

So by entering string "hello" the memory at address MyString will be set to:
33, 5, 104 ('h'), 101 ('e'), 108 ('l'), 108 ('l'), 111 ('o') followed by 26 zeroes (result of DUP (0)).

Actually I think your code has bugs, setting up maximum size as total buffer size BUFF_LENGTH EQU STR_LENGTH + 3, while from the interrupt description I would expect the first byte to contain only STR_LENGTH. You may verify this by trying to input 33 characters long string, and check in debugger if the memory is overwritten after the MyString buffer. Also the +3 doesn't make much sense, as only +2 bytes are used for max size, and actual size.

Now in code happens this:

LEA bx,[MyString]   ; bx = address of first byte of buffer (contains maximum size)
INC bx              ; bx now points to actual size
; instead LEA bx,[MyString+1] could have been used, skipping one INC bx
MOV cl,[bx]         ; cl = actual string size
XOR ch,ch           ; ch = 0 (extending 8 bit value in cl to unsigned 16 bit in cx)
; other option on 386+ CPU is MOVZX cx,BYTE PTR [bx]
; or XOR cx,cx  MOV cl,[bx]
INC bx              ; bx now points to the first character

It keeps then doing with [bx] content whatever it wish, incrementing bx again during loop to access next character, till the cx counter does reach 0.

You should definitely start up the debugger, step trough that code instruction by instruction, and point memory window to MyString and watch how bx is used to access particular bytes there, and how those INC bx fits that.

This will explain it even better than anything else.

edit:

One more thing. I actually kept one secret to myself, which is integral part of your question.

So "How did I know?": you should always recall, that computers are computational machines. You put some program in (list of instructions), you put some numbers in, let it execute the instructions, and get the resulting numbers out.

I had the code (instructions). Next thing I was looking for in your code was "how do you define the string". I found it's entered by user, read by int 21h function. So I googled the function, how it works, what data it returns. snap: suddenly all made sense (except max size bug, which I decided is simply a bug from your lector, it's easy to do some bug in ASM even for seasoned programmers).

So always make sure you understand all instructions, and you understand well what are the input data (their structure and values). Then you can run everything in your head, just like on the CPU, to find out how those input data turns into output data. It's a purely deterministic computational process, you do not need to guess anything, it's exactly defined what happens next in every stage of the computation.

If you know exactly what are those definitions, it's actually straightforwardly easy, easier than any high level abstraction stuff, just lot more tedious.

When you are new to ASM, it's much easier to watch this happening in debugger (and it will also help you to understand ASM much faster), than doing it in your head.

Assembly code for simple coding/decoding of string confusion?

Answers (2)

Related Questions