user145453
user145453

Reputation: 327

Issue when entering protected-mode from bootloader

this assembly code enters the protected mode from bootloader but fails to reset CS segment (or do the far jump) after the far jump is called and reboots. If I remove the far jump it goes into the infinite loop (0x66,jmp $) in protected mode just fine, without rebooting.

[bits 16]
[org 0x7c00]
xor ax,ax
xor eax,eax
add eax,ENTRY_POINT_32 ;address to plug to far jmp
mov [ENTRY_OFF],eax
xor eax,eax
mov eax,GDT                ;load GDT label address
mov [GDTR+2],eax ; load it into address space in GDTR
lgdt [GDTR]                   ;load GDTR
cli                                    ;turn off masked interrupts
in al,0x70
or al,0x80
out 0x70,al                     ;turn off nonmasked interrupts
in al,0x92
or al,2
out 0x92, al ;open line A20 (change address 20 to 32 bits)
mov eax,cr0
or al,1
mov cr0,eax                 ;switch to protected mode
db 0x66                        ;prefix of opcode to change bitness
db 0xEA                       ;opcode of jmp far
ENTRY_OFF dd 0x0 ;32 bit offset of 32 bit instructions
dw 00001000b ; selector 1st descriptor CODE_descr,=1
ENTRY_POINT_32:
db 0x66                      ;prefix of opcode to change bitness
jmp $                          ;infinite jump to the same location
GDT:
NULL_descr dd 0x0,0x0 ; must be present in GDT
CODE_descr db  0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0
;descriptor of 32 bit code segment, base 0, size ffffffff
DATA_descr db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0
;descriptor of 32 bit data segment, base 0, size ffffffff
VIDEO_descr 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
;descriptor of video buffer, base 0x000B8000, size ffff
GDT_size db $-GDT ;size of GDT table
GDTR dw GDT_size-1 ;next 3 words are size &
dd 0x0 ;address of beginning of GDT, loaded in code
times 510 - ($ - $$) db 0
dw 0xaa55

Original code from wasm.in, with slight modifications.

Upvotes: 1

Views: 865

Answers (1)

Michael Petch
Michael Petch

Reputation: 47573

In real mode there is an implied segment on all memory operands. If the memory operand doesn't contain BP as a base then the implied segment is DS. If the memory operand does contain BP the implied base is SS. Your memory operands don't use BP so the implied segment is DS. Instructions with a memory operand like this:

mov [ENTRY_POINT_32],eax

Are equivalent to:

mov [ds:ENTRY_POINT_32],eax

Real mode uses segment:offset addressing to arrive at a physical memory address. If DS is wrong you will be writing to the wrong memory location. 20-bit Physical Address = (segment<<4)+offset.

With that being said, when your bootloader starts you can't rely on the segment and general purpose registers being the values you expect with the exception of DL which contains the boot drive passed by the BIOS. You can read my Bootloader Tips for more information on bootloader development.

You need to explicitly set the DS register. Since your code is using org 0x7c00 you need a DS segment set to zero. (0x0000<<4)+0x7c00 = 0x07c00 (physical address). The bootloader is always loaded by the BIOS to physical address 0x07c00.

You also have these two lines:

xor ax,ax
xor eax,eax

The first is unnecessary since you set all of EAX to zero with the latter. The following line is unnecessary if you use the bits 32 NASM directive before your 32-bit code:

db 0x66                      ;prefix of opcode to change bitness

The GDTR is also set up incorrectly. You compute the size improperly. You have this code:

GDT_size db $-GDT ;size of GDT table
GDTR dw GDT_size-1 ;next 3 words are size &

You create a memory location with a byte containing the size of the GDT. GDTR dw GDT_size-1 takes the offset of the label GDT_size and subtract one from it. This only works because the offset of label GDT_size is greater than the size of the GDT. You could do something like:

GDT:
    NULL_descr: dd 0x0,0x0      ; must be first entry in GDT

    ; descriptor of 32 bit code segment, base 0, size ffffffff
    CODE_descr: db  0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0

    ; descriptor of 32 bit data segment, base 0, size ffffffff
    DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0

    ; descriptor of video buffer, base 0x000B8000, size ffff
    VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
GDT_END:

GDTR dw GDT_END-GDT-1            ; Size of GDT (minus 1)
     dd 0x0                      ; address of beginning of GDT, loaded in code

When creating self-modifying code you also need to concern yourself with clearing the instruction prefetch queue to ensure the processor sees the changes to the code. The processor may have already pre-read the FAR JMP instruction you are modifying and isn't aware of the change you made to the code. That can be rectified by simply inserting a JMP into the code after you modify the instruction. After you update the instruction with the computed address you can do something like:

    mov [ENTRY_OFF],eax
    jmp clear_prefetch          ; Clear the instruction prefetch queue
                                ;     by jumping to next instruction
clear_prefetch:

Working code (I have cleaned up the formatting a bit) could look like:

bits 16
org 0x7c00

start:
    xor eax,eax
    mov ds, ax                  ; Explicitly set DS to zero

    add eax,ENTRY_POINT_32      ; address to plug to far jmp
    mov [ENTRY_OFF],eax
    jmp clear_prefetch          ; Clear the instruction prefetch queue
                                ;     by jumping to next instruction
clear_prefetch:

    xor eax,eax
    mov eax,GDT                 ; load GDT label address
    mov [GDTR+2],eax            ; load it into address space in GDTR
    lgdt [GDTR]                 ; load GDTR

    cli                         ; turn off masked interrupts
    in al,0x70
    or al,0x80
    out 0x70,al                 ; turn off nonmasked interrupts
    in al,0x92
    or al,2
    out 0x92, al                ; enable A20 line
    mov eax,cr0
    or al,1
    mov cr0,eax                 ; switch to protected mode

    db 0x66                     ; prefix of opcode to change bitness
    db 0xEA                     ; opcode of jmp far
ENTRY_OFF:
    dd 0x0                      ; 32 bit offset of 32 bit instructions
    dw 00001000b                ; selector 1st descriptor CODE_descr,=1

bits 32
ENTRY_POINT_32:
    jmp $                       ; infinite jump to the same location

GDT:
    NULL_descr: dd 0x0,0x0      ; must be first entry in GDT

    ; descriptor of 32 bit code segment, base 0, size ffffffff
    CODE_descr: db  0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0

    ; descriptor of 32 bit data segment, base 0, size ffffffff
    DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0

    ; descriptor of video buffer, base 0x000B8000, size ffff
    VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
GDT_END:

GDTR dw GDT_END-GDT-1            ; Size of GDT (minus 1)
     dd 0x0                      ; address of beginning of GDT, loaded in code

times 510 - ($ - $$) db 0
dw 0xaa55

No Need for Run-time Computing of FAR JMP in a Bootloader

Your code is overly complex for this situation. Legacy BIOSes on the x86 always load the bootloader to physical address 0x07c00. An advantage to using an ORG 0x7c00 and setting the segments to 0x0000 is that 0x0000:0x7c00 and the linear address (same as physical address in real mode) are the same offset of 0x07c00 from the start of memory. You can use this to your advantage and avoid unnecessary computation at run-time. The code could look like this:

bits 16
org 0x7c00

start:
    xor ax,ax
    mov ds,ax                   ; Explicitly set DS to zero

    lgdt [GDTR]                 ; load GDTR

    cli                         ; turn off masked interrupts
    in al,0x70
    or al,0x80
    out 0x70,al                 ; turn off nonmasked interrupts
    in al,0x92
    or al,2
    out 0x92, al                ; enable A20 line

    ; Enter protected mode
    mov eax,cr0
    or al,1
    mov cr0,eax                 ; switch to protected mode
    jmp CODE32_SEL:ENTRY_POINT_32

bits 32
ENTRY_POINT_32:
    mov eax, DATA32_SEL         ; Set the protected mode selector
    mov ds, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax
    mov esp, 0x9C000            ; Set protected mode stack below EBDA

    mov eax, VIDEO32_SEL        ; Set the video memory selector
    mov es, ax

    ; Print some characters to top left of the screen in white on magenta
    xor ebx, ebx
    mov word [es:ebx],   0x57 << 8 | 'M'
    mov word [es:ebx+2], 0x57 << 8 | 'D'
    mov word [es:ebx+4], 0x57 << 8 | 'P'

    jmp $                       ; infinite jump to the same location

GDT:
    NULL_descr: dd 0x0,0x0      ; must be first entry in GDT

    ; descriptor of 32 bit code segment, base 0, size ffffffff
    CODE_descr: db  0xFF,0xFF,0x0,0x0,0x0,10011010b,11001111b,0x0

    ; descriptor of 32 bit data segment, base 0, size ffffffff
    DATA_descr: db 0xFF,0xFF,0x0,0x0,0x0,10010010b,11001111b,0x0

    VIDEO_descr: db 0xFF,0xFF,0x0,0x80,0x0B,10010010b,01000000b,0x0
    ; descriptor of video buffer, base 0x000B8000, size ffff
GDT_END:
CODE32_SEL  equ CODE_descr-GDT
DATA32_SEL  equ DATA_descr-GDT
VIDEO32_SEL equ VIDEO_descr-GDT

GDTR dw GDT_END-GDT-1            ; Size of GDT (minus 1)
     dd GDT                      ; address of beginning of GDT

times 510 - ($ - $$) db 0
dw 0xaa55

This code computes the CODE and DATA selectors at assembly time. It also computes the GDTR at assembly time and hard codes the FAR JMP. It should be noted that since the bootloader and the 32-bit entry point are entirely inside the first 64KiB of memory you can use a 16-bit offset rather than 32-bit in the FAR JMP to protected mode. There is no need for self modifying code.

Note: Creating a selector for the video memory isn't necessary. You can always address that memory using the 32-bit 4GiB flat data selector.


When to use Code that Computes Addresses at Run-time?

The concept of building up a FAR JMP, and generating the GDTR record at run time isn't entirely without merit. In environments where the code may be placed in memory at different segments then you'd need to compute the FAR JMP and the GDT linear address for the GDTR at run-time. This would be the case if you were trying to enter protected mode from DOS via a COM or EXE program. The DOS loader decides what segment to place things in. In that case you'd have to compute the addresses at run-time. I wrote some code a couple of years ago for someone on IRC that does just that. My code doesn't disable the NMIs (it should), and it doesn't modify the FAR JMP. What I do is build the FAR JMP address on the stack, and then do an indirect FAR JMP via the address on the stack. The principle is the same as doing the self modifying code.

A sample DOS COM program that does run-time generation of the address for the FAR JMP on the stack and generates the GDT address in the GDTR is as follows:

; Assemble with NASM as
;     nasm -f bin enterpm.asm -o enterpm.com

STACK32_TOP EQU 0x200000
CODE32_REL  EQU 0x110000
VIDEOMEM    EQU 0x0b8000

use16
; COM program CS=DS=SS
org 100h

    call check_pmode    ; Check if we are already in protected mode
                        ;    This may be the case if we are in a VM8086 task.
                        ;    EMM386 and other expanded memory manager often
                        ;    run DOS in a VM8086 task. DOS extenders will have
                        ;    the same effect

    jz not_prot_mode    ; If not in protected mode proceed to switch
    mov dx, in_pmode_str;    otherwise print an error and exit back to DOS
    mov ah, 0x9
    int 0x21            ; Print Error
    ret

not_prot_mode:
    call a20_on         ; Enable A20 gate (uses Fast method as proof of concept)
    cli

    ; Compute linear address of label gdt_start
    ; Using (segment << 4) + offset
    mov eax,cs          ; EAX = CS
    shl eax,4           ; EAX = (CS << 4)
    mov ebx,eax         ; Make a copy of (CS << 4)
    add [gdtr+2],eax    ; Add base linear address to gdt_start address
                        ;     in the gdtr
    lgdt [gdtr]         ; Load gdt

    ; Compute linear address of label code_32bit
    ; Using (segment << 4) + offset
    add ebx,code_32bit  ; EBX = (CS << 4) + code_32bit

    push dword 0x08     ; CS Selector
    push ebx            ; Linear offset of code_32bit
    mov bp, sp          ; m16:32 address on top of stack, point BP to it

    mov eax,cr0
    or eax,1
    mov cr0,eax         ; Set protected mode flag

    jmp dword far [bp]  ; Indirect m16:32 FAR jmp with
                        ;    m16:32 constructed at top of stack
                        ;    DWORD allows us to use a 32-bit offset in 16-bit code

; 16-bit functions that run in real mode

; Check if protected mode is enabled, effectively checking if we are
; in in a VM8086 task. Set ZF to 1 if in protected mode

check_pmode:
    smsw ax
    test ax, 0x1
    ret


; Enable a20 (fast method). This may not work on all hardware
a20_on:
    cli
    in al, 0x92         ; Read System Control Port A
    test al, 0x02       ; Test current a20 value (bit 1)
    jnz .skipfa20       ; If already 1 skip a20 enable
    or al, 0x02         ; Set a20 bit (bit 1) to 1
    and al, 0xfe        ; Always write a zero to bit 0 to avoid
                        ;     a fast reset into real mode
    out 0x92, al        ; Enable a20
.skipfa20:
    sti
    ret

in_pmode_str: db "Processor already in protected mode - exiting",0x0a,0x0d,"$"

align 4
gdtr:
    dw gdt_end-gdt_start-1
    dd gdt_start

gdt_start:
    ; First entry is always the Null Descriptor
    dd 0
    dd 0

gdt_code:
    ; 4gb flat r/w/executable code descriptor
    dw 0xFFFF           ; limit low
    dw 0                ; base low
    db 0                ; base middle
    db 0b10011010       ; access
    db 0b11001111       ; granularity
    db 0                ; base high

gdt_data:
    ; 4gb flat r/w data descriptor
    dw 0xFFFF           ; limit low
    dw 0                ; base low
    db 0                ; base middle
    db 0b10010010       ; access
    db 0b11001111       ; granularity
    db 0                ; base high
gdt_end:

; Code that will run in 32-bit protected mode
; Align code to 4 byte boundary. code_32bit label is
; relative to the origin point 100h
align 4
code_32bit:
use32
; Set virtual memory address of pm code/data to CODE32_REL
; We will be relocating this section from low memory where DOS
; originally loaded it.
section protectedmode vstart=CODE32_REL, valign=4
start_32:
    cld                 ; Direction flag forward
    mov eax,0x10        ; 0x10 is flat selector for data
    mov ds,eax
    mov es,eax
    mov fs,eax
    mov gs,eax
    mov ss,eax
    mov esp,STACK32_TOP ; Should set ESP to a usable memory location
                        ; Stack will be grow down from this location

    mov edi,start_32    ; EDI = linear address where PM code will be copied
    mov esi,ebx         ; ESI = linear address of code_32bit
    mov ecx,PMSIZE_LONG ; ECX = number of DWORDs to copy
    rep movsd           ; Copy all code/data from code_32bit to CODE32_REL
    jmp 0x08:.relentry  ; Absolute jump to relocated code

.relentry:
    mov ah, 0x57        ; Attribute white on magenta

    ; Print a string to display
    mov esi,str         ; ESI = address of string to print
    mov edi,VIDEOMEM    ; EDI = base address of video memory
    call print_string_attr

    cli
endloop:
    hlt                 ; Halt CPU with infinite loop
    jmp endloop

print_string_attr:
    push ecx
    xor ecx,ecx         ; ECX = 0 current video offset
    jmp .loopentry
.printloop:
    mov [edi+ecx*2],ax  ; Copy attr and character to display
    inc ecx             ; Next word position
.loopentry:
    mov al,[esi+ecx]    ; Get next character to print
    test al,al
    jnz .printloop      ; If it's not NUL continue
.endprint:
    pop ecx
    ret

str: db "Protected Mode",0

PMSIZE_LONG equ ($-$$+3)>>2
                        ; Number of DWORDS that the protected mode
                        ;    code and data takes up (rounded up)

This code is a little more complex than I might give it credit for. The part of interest would be the pointer calculations in not_prot_mode which are similar to the types of calculations your code is doing. After entering protected mode the code relocates itself above DOS at 0x00110000. That was a requirement of the person who originally asked me about switching into protected mode.

Note: This code only runs in an environment where protected mode is not already enabled. It will display an error and exit if being run inside a VM8086 task.

Upvotes: 3

Related Questions