Pantokrator
Pantokrator

Reputation: 33

Copying strings in assembly

I'm learning assembly and I'm trying to make a method that would make strings in assembly. I started trying to copy one string to another and store the copy in a variable for potential future use.

I'm using emu8086 and I get the following error:

unknown opcode skipped: 64

not 8086 instruction - not supported yet.

org 100h

jmp start

msg:    db      "Hello, World!", 0 
a:      db       0 
start: 


        mov     si,msg
        mov     di, a
        call    _make_str
        mov     a, di
        mov     dx, a
        mov     ah, 09h 
        int     21h 

        mov     ah, 0   
        int     16h 

        ret
_make_str:      
    pusha
_next:
        mov     al, [si]                  
        mov     [di], al                  
        inc     si
        inc     di
        cmp      al, 0                      
        jne    _next
        popa
        ret 

What this error means and what am I doing wrong?

Upvotes: 0

Views: 3387

Answers (2)

Ped7g
Ped7g

Reputation: 16596

You have a bit naive expectation what string is.

msg:    db      "Hello, World!", 0 

This is assembled into machine code as 14 bytes (each character is one byte in ASCII encoding (there are different encodings with different characteristics, but emu8086 is ASCII based) and one more for the terminating zero.

a:      db       0 

This is assembled as single byte of zero value.

Then your code starts and inside the _make_str it will copy 14 bytes from address msg to address a. But address a+1 is equal to start, so the 13 characters are overwriting your machine code for the code itself, going over the call _make_str and following mov a,di. Then at ret the code returns on the address where mov a,di was supposed to be, but there's already byte 64 from the string, so the unknown opcode is reported.

The emu8086 has built-in debugger, so use the single-step over instructions, and memory view, to see yourself how strings are compiled, and what your code does.

And in assembler there are no variables, those names ahead of db/dw/dd/... are "symbols", and they are like bookmark into computer memory, containing the address of the first byte of that value.

If you need 20 bytes of memory for 19-chars max long strings, then you need to allocate/reserve 20 bytes of memory, the memory will not grow automagically, or re-address the content to make a room for unexpected data.


What if I don't know how long the string might be, how do I initialize the variable of changing length?

You don't use unknown length data. Either you know how long the data may be, or you must put there some maximum (the high level programming languages have limits too, if you would push hard enough, you would break the notion of being "unlimited" easily).

Depending on what kind that maximum is, the approach may differ.

For example if you are reading user's name, you can say the maximum is 80 bytes, and reserve 80 bytes in the data area and that's it, user with longer name will either crash your app, or if you did code it correctly, he will be able to enter only 79 or 80 characters (depending if you need zero terminator or not, inside the 80 byte array).

If you are reading small text like description of shop item, you don't know the maximum exactly, but you are sure it's either within thousands, or something is horribly wrong, then you can dynamically reserve space on stack, like sub sp,<length_of_text> and store the text in stack area.

If you are reading long data, like binary editor of 2+GB files, you will have to have some array describing which chunks of file you have currently in memory, and swap in/out dynamically, using all the available memory the OS offers as cache ("heap" kind of memory allocation), but if you run out of heap, modern OS will increase it by using virtual memory stored on disk (swap). But it's still better to avoid such situations by better algorithm and app design.

Also in emu8086 all the other limits of 16 bit x86 real mode applies, i.e. in your COM-like example your whole code+data+stack must fit into single 64kiB segment (starting at 0x100 offset, not even full 64kiB), unless your code will check the DOS for further available segments, and use more of them (after loading COM file in ordinary DOS as first app you have usually about 500-580kiB of memory free after your current segment).

Living in the 16 bit world and asking about "don't know the size" is extra pain, in 16 bit world you had to know sizes ahead, and plan for them, otherwise you will run into problems very quickly.


EDIT: actually... you are hitting different bug probably, or hard to tell which one.

    mov     si,msg
    mov     di, a

These set si and di register with two-byte (16bit) values from memory at addresses msg and a. It's MASM syntax, where memory access doesn't require [].

To get address of first byte you need mov si, OFFSET msg or lea si,[msg] ... so now I'm not even sure which memory and where you overwrite, as you use the characters data as memory pointer, but obviously you overwrite something vital.

Use the emu8086 debugger to see yourself what exactly is going on.

(after you would fix your code to load addresses properly, you will hit the code overwrite as I described it in answer)

Upvotes: 2

Cmaster
Cmaster

Reputation: 72

You need to make your a variable have enough space to store all the characters from the source string, now it has only one byte. Also check out the rep movsb instruction.

Upvotes: 0

Related Questions