XCemaXX
XCemaXX

Reputation: 13

How addressing works in assembly x86 Real Mode? Why does label return different values?

I have two variants of code for bootloader (it should move 1kb stack after 512b of a bootloader code). Start physical address is always 0x7c00 (label "start"). BIOS copies a bootloader code in RAM. When I use "MOV SP, start+1024+512":

  1. SP will be 0x7c00 + 1024 + 512, because physical address should be SS:SP = 0<<4 + 0x7c00 + 1024 + 512. So "start" = 0x7c00
  2. SP will be 0 + 1024 + 512, because physical address should be SS:SP = 0x07c0<<4 + 1024 + 512. So "start" = 0x0000

But if I write "jmp start" proccessor will always go to the address 0x7c00. It will calculate

  1. 0 + 0x07c0
  2. 0x07c0<<4 + 0

Why does 'start' return different values in MOV and LMP? Or it will be the same, but in MOV proccessor doesn't add segment, and in JMP adds? Also is it true that in this case "start" will be calculated based on DS:SI and not CS:IP? Another example. If I add code in the end

mov SI, main
lodsb ;write data from main to AL

Proccessor will always go to the full physical address segment+offset and get value 'S' in register AL. SI will be equal to offset only? And processor will add segment during execution of lodsb?

Extra questions:

How does processor execute "jmp main"? This instruction is above "mov ds, ax". Therefore code in the 2 variant has an error, but it works.

What is the value of CS register by default when BIOS loads bootloader? Obviously CS:IP should be 0x7c00.

1 variant

[bits 16]
[org 0x7c00]
start:  ;offset= 0x7c00
jmp main
db    "Some data" ;actually fake BIOS Parameter Block(BPB)
main: 
mov ax, 0 
mov ds, ax ; data segment =0.
mov ss, ax ; stack segment = 0
mov sp, start+1024+512 ;stack pointer = 0x7c00+1024+512

2 variant

[bits 16]
[org 0x0000]
start:  ;offset= 0x0000
jmp main
db    "Some data" ;actually fake BIOS Parameter Block(BPB) 
main:
mov ax, 0x07c0 
mov ds, ax ; data segment =0x07c0.
mov ss, ax ; stack segment =0x07c0.
mov sp, start+1024+512 ;stack pointer = 0+1024+512

Upvotes: 1

Views: 764

Answers (1)

Martin Rosenau
Martin Rosenau

Reputation: 18503

... always go to the address 0x7c00. It will calculate ...

In segmented memory models, you should not only think about the effective (physical) address, but you always have to think of an address as a (16+16)-bit value in real mode or 16-bit protected mode or a (16+32)-bit value in 32-bit protected mode.

Let's say your program contains the instruction mov al, cs:[100h].

This instruction will read some byte from the address CS:0x100 which is effectively (CS<<4)+0x100.

If you perform a jump to 0x7C0:0, this instruction will access the memory at address (0x7C0<<4)+0x100=0x7D00; if you perform a jump to 0:0x7C00, this instruction will access the memory at address (0<<4)+0x100=0x100.

This means that your program does something different if you jump to 0x7C0:0 or to 0:0x7C00. For this reason, it is said that 0x7C0:0 and 0:0x7C00 are two different addresses.

Let's assume that main is located at the physical address 0x7C40.

This means that the address of main is neither 0x7C40 nor 0x40, but it is either 0:0x7C40 (in "variant 1") or it is 0x7C0:0x40 (in "variant 2") because you always have to specify the address as pair of segment and offset.

In segmented memory models in protected mode this is even more complicated and it is much more important to use the correct segments!

SI will be equal to offset only?
Also is it true that in this case "start" will be calculated based on DS:SI and not CS:IP?

The lodsb instruction accesses address DS:SI, stosb accesses ES:DI.

This means that SI only holds the offset and DS only holds the segment.

The variants 1 and 2 will load different values to the SI register because main is located at address 0:0x7C40 (this means: SI=0x7C40) in one variant and at 0x7C0:0x40 (SI=0x40) in the other variant.

So in variariant 1, you'll have to set DS=0, in variant 2 you'll, have to set DS=0x7C0.

In one case, lodsb will access address 0:0x7C40, in the other case, lodsb will access address 0x7C0:0x40. In both cases the same byte in RAM is accessed: Physical address 0x7C40.

How does processor execute "jmp main"? This instruction is above "mov ds, ax". Therefore code in the 2 variant has an error, but it works.

There are two variants of the JMP instruction:

One variant does not write a fixed value to the IP register, but it adds some constant value to the IP register. So if 0x40 is added to 0x7C0:0, code execution continues at 0x7C0:0x40. And if 0x40 is added to 0:0x7C00, code execution continues at 0:0x7C40. In both cases, the next instruction is located at the physical address 0x7C40. (Probably, the jmp main is this variant.)

The other variant takes a pair of segment and offset as an argument. You cannot jump to address 0x7C40, but you can either jump to address 0:0x7C40 or to address 0x7C0:0x40.

What is the value of CS register by default when BIOS loads bootloader?

There are a few BIOSes that jump to 0x7C0:0, but the standard seems to be 0:0x7C00.

For this reason, many boot loaders perform a jump to 0x7C0:0x60 (as an example) to ensure that the CS register has a defined value.

Upvotes: 1

Related Questions