Pooshkis
Pooshkis

Reputation: 43

Assembly 8086, Why later instruction doesn't modify previous one after execution

I'm new in assembler and trying to figure out this code:

072A:100 mov word ptr [0107], 4567
072A:106 mov ax, 1234
072A:109 add ax, dx

Thing that I understand is that first instruction puts two bytes with values 67 45 at address 072A:107. In the end AX = 4567.

What I don't understand is why newer instruction mov ax, 1234 doesn't change value at address 072A:107 of previous mov word ptr [0107] instruction, why dump isn't changed?.

Thank you in advance.

Upvotes: 0

Views: 437

Answers (1)

Ped7g
Ped7g

Reputation: 16596

When you are looking at that disassembly (before executing first instruction), the memory is already loaded with the machine code (I will assume this is DOS COM file, so cs=ds=ss=0x72A and the first mov will self-modify the second mov).

So the content of memory is already (the middle part is machine code bytes in hexa):

072A:100 C70607016745   (mov word ptr [0107], 4567) <- cs:ip points here
072A:106 B83412         (mov ax, 1234)
072A:109 01D0           (add ax, dx)

After executing first instruction (C7 06 07 01 67 45 - 6 bytes are read by CPU and decoded as mov [..],.. instruction) the memory content will change to:

072A:100 C70607016745   (mov word ptr [0107], 4567)
072A:106 B86745         (mov ax, 4567)  <- cs:ip points here
072A:109 01D0           (add ax, dx)

If you will disassemble the machine code now, you will see the second instruction as "mov ax, 4567" already... the CPU has no idea, that the original source did say mov ax, 1234 and as you can see from the machine code in memory, there's no way to reconstruct that, there's no 1234h value anywhere in memory.

Also when you reload the code from executable, it will be again mov ax, 1234, because that's what is stored in the binary after assembling step, before executing it.

The machine code is not built at runtime from source, the assembler does produce binary machine code during assembling time, so there's nothing to "restore" that second instruction back to mov ax,1234 (source and assembler are not relevant at runtime).

If this would be some kind of interpreted language, preparing every instruction just before execution, assembling from source, then the first instruction would have to modify source to cause self-modification at "interpretation-time", but most of the interpreters don't allow any easy way to modify currently interpreted source.

And even toy/simulator-machines designed to teach assembly (MARS/SPIM, or 8-bit assembler simulator) operate at "runtime" with binary machine code, not source code (although they may or may not allow self-modification to propagate into simulation, some simulators may ignore it and protect original machine code from modification for whatever weird reasons).

warning for assembly newcomers: while self-modification of code may sound cool at first (at least it did to me), it's strongly discouraged: 1) you can NOT use it by default in modern SW (unless you go quite some lengths to enable it) 2) it hurts performance of modern CPUs a lot, because when modern x86 CPU detects write at 107h, it did already fetched+decoded+speculatively executed several instructions down the line, so it has to throw all of that "future" work into trash, clear the internal caches, and start over, which means that instruction like mov ax,1234 which may have been executed in single cycle or even along some other instruction, may instead take 100+ cycles. 3) it allows for difficult to find bugs, if you are not experienced enough to guess all implications of such code.

So it's valuable to understand the concept and what happens, but don't use it unless you are doing something extra niche/specialized, like 256B intro and it saves you two bytes, then it's valid.

Upvotes: 2

Related Questions