cbot
cbot

Reputation: 117

Intel Assembly ljmp syntax from AT&T syntax

I am trying to convert the xv6 boot code from At&t syntax to Intel syntax and I have a problem with the ljmp instruction. I am trying to learn the boot process of Intel computers and I am not particularly strong with Intel assembly.

The original AT&T syntax is ljmp $0x8, $start32.

Minimal example:

.code16
   jmp 0x8:start32          # won't assemble

.code32
start32:
   nop

Using as -32 -msyntax=intel -mnaked-reg foo.s with GNU Binutils 2.35.1 produces
Error: junk ':start32' after expression for the far jmp line.

I am using GNU as, and gcc tools.
There might also be other problems with the assembly such as the gdtdesc and gdt.

The full code ported to Intel syntax is:

# Start the first CPU: switch to 32-bit protectied mode, jump into C.
# The BIOS loads this code from the first sector of the hard disk into
# memory at physical address 0x7c00 and starts executing in real mode
# with cs = 0 and ip = 7c00.
.code16
.global start
start:
    # Disable interrupts.
    cli

    # Zero data segment registers DS, ES, and SS.
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax

seta20.1:
    # Wait for not busy.
    in al, 0x64
    test al, 0x2
    jnz seta20.1

    # 0xd1 -> port 0x64
    mov al, 0xd1
    out 0x64, al

seta20.2:
    # Wait for not busy.
    in al, 0x64
    test al, 0x2
    jnz seta20.2

    # 0xdf -> port 0x60
    mov al, 0xdf
    out 0x60, al

    # Switch from real to protected mode. Use a bootstrap GDT that makes
    # virtual addresses map directly to physical addressses so that the
    # effective memory map doesn't change during the transition.
    lgdt gdtdesc

    # Protection Enable in cr0 register.
    mov eax, cr0
    or eax, 0x1
    mov cr0, eax

    # Complete the transtion to 32-bit protected mode by using a long jmp
    # to reload cs and eip. The segment descriptors are set up with no
    # translation, so that the mapping is still the identity mapping.

    # This instruction giving me problems.
    ljmp start32, 0x8

.code32
start32:
    # Set up the protected-mode data segment registers
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov ss, ax

    # Zero the segments not ready for use.
    xor ax, ax
    mov fs, ax
    mov gs, ax

    # Set up the stack pointer and call into C.
    mov esp, start
    call bootmain

    # If bootmain returns spin.. ??
spin:
    hlt
    jmp spin

# Bootstrap GDT set up null segment, code segment, and data segment respectively.
# Force 4 byte alignment.
.p2align 2
gdt:
    .word 0x0000, 0x0000
    .byte 0, 0, 0, 0
    .word 0xffff, 0x0000
    .byte 0, 0x9a, 0xcf, 0
    .word 0xffff, 0x0000
    .byte 0, 0x92, 0xcf, 0

# sizeof(gdt) - 1 and address of gdt respectively.
gdtdesc:
    .word (gdtdesc - gdt - 1)
    .long gdt

Upvotes: 2

Views: 3529

Answers (2)

Michael Petch
Michael Petch

Reputation: 47603

In the complete translated code you presented, this line is incorrect:

ljmp start32, 0x8

The proper syntax for a FAR JMP in GNU Assembler's Intel syntax is:

ljmp 0x08, start32

The selector value would be first and the offset second. It seems in translating from AT&T syntax you reversed these 2 values when the order should have remained the same. With the values reversed you would have got the error Error: can't handle non absolute segment in 'ljmp'. In GNU Assembler's Intel syntax you can also substitute ljmp with jmp so jmp 0x08, start32 would work as well.

There are different flavors of Intel syntax. jmp 0x8:start32 is NASM's Intel syntax and it differs from GNU Assembler's Intel syntax where the : and , differ. If you used a : to separate the two values you would get the error Error: junk ':start32' after expression in GNU Assembler.


Notes

  • If the code in bootmain doesn't work it is likely an issue unrelated to the bootloader code you presented in this question. If you are also building all the C code with Intel Syntax rather than AT&T syntax, then make sure all the inline assembly has been properly converted as source and operand would have been reversed as well. xv6 likely has inline assembly in a number of files including xv6-public/x86.h, xv6-public/spinlock.c, xv6-public/usertests.c and xv6-public/stressfs.c

Upvotes: 6

Peter Cordes
Peter Cordes

Reputation: 365517

You can use jmp 0x08, start32

For some reason, jmp 0x8:start32 only works after .intel_syntax noprefix, even with command line args that should be equivalent. This is the syntax used by Binutils objdump -d -Mintel -mi8086, e.g. ea 16 00 08 00 jmp 0x8:0x16 so it's probably a GAS bug that it's not accepted sometimes.


I edited your question to create a small reproducible example with as 2.35.1 (which I have on Arch GNU/Linux) based on your comments replying to Jester. I included command line options: I assume you must have been using those because there's no .intel_syntax noprefix directive in your file.

That seems to be the problem: -msyntax=intel -mnaked-reg makes other Intel syntax things work, like xor ax,ax, but does not make jmp 0x8:start32 work (or other ways of writing it). Only a .intel_syntax noprefix1 directive makes that syntax for far jmp work.

# .intel_syntax noprefix        # rely on command line options to set this
.code16
   xor  ax, ax              # verify that command-line setting of intel_syntax worked, otherwise this line errors.

   ljmp 0x8, start32        # Working before or after a syntax directive, but is basically AT&T syntax
#   jmp 0x8:start32          # fails here, works after a directive
   jmp 0x8, start32         # Michael Petch's suggested syntax that's still somewhat AT&Tish.  works with just cmdline opts. 

.att_syntax
   ljmp $0x8, $start32      # working everywhere, even with clang
.intel_syntax noprefix
   jmp 0x8:start32          # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive

.code32
start32:
   nop

I verified that -msyntax=intel -mnaked-reg work for other instructions where their effect is necessary: movzx ax, al works. But without -mnaked-reg we'd get "too many memory references" because "ax" and "al" would be taken as symbol names. Without or "operand size mismatch" without -msyntax=intel.

A GAS listing from as -32 -msyntax=intel -mmnemonic=intel -mnaked-reg -o foo.o foo.s -al --listing-lhs-width=2 --listing-rhs-width=140
(I'm pretty sure -mmnemonic=intel is irrelevant, and implied by syntax=intel.)

Note that you can see which instructions worked because they have machine code, and which didn't (the first jmp 0x8:start32) because the left-hand column is empty for it. The very first column would normally be addresses, but is ???? because assembly failed. (Because I uncommented the jmp 0x8:start32 to show it failing the first time, working the 2nd time.)

foo.s: Assembler messages:
foo.s:6: Error: junk `:start32' after expression
GAS LISTING foo.s                       page 1


   1                            # .intel_syntax noprefix        # rely on command line options to set this
   2                            .code16
   3 ???? 0FB6C0                   movzx   ax, al              # verify that command-line setting of intel_syntax worked, otherwise this line errors.
   4                       
   5 ???? EA170008 00              ljmp 0x8, start32        # Working before or after a syntax directive, but is basically AT&T syntax
   6                               jmp 0x8:start32          # fails here, works after a directive
   7 ???? EA170008 00              jmp 0x8, start32         # Michael Petch's suggested syntax that's still somewhat AT&Tish.  works with just cmdline opts. 
   8                       
   9                            .att_syntax
  10 ???? EA170008 00              ljmp $0x8, $start32      # working everywhere, even with clang
  11                            .intel_syntax noprefix
  12 ???? EA170008 00              jmp 0x8:start32          # objdump disassembly syntax, but only works after a .intel_syntax noprefix directive
  13                       
  14                            .code32
  15                            start32:
  16 ???? 90                       nop
  17                       

(GAS does listing field widths for the left column in "words", which apparently means 32-bit chunks. That's why the 00 most-significant byte of the segment selector is separated by a space.)

Putting a label before the jmp 0x8:label didn't help; it's not an issue of forward vs. backward reference. Even jmp 0x8:23 fails to assemble.


Syntax "recommended" by disassemblers, from a working build:

objdump -drwC -Mintel -mi8086 foo.o :

foo.o:     file format elf32-i386

Disassembly of section .text:

00000000 <start32-0x17>:
   0:   0f b6 c0                movzx  ax,al
   3:   ea 17 00 08 00          jmp    0x8:0x17 4: R_386_16     .text
   8:   ea 17 00 08 00          jmp    0x8:0x17 9: R_386_16     .text
   d:   ea 17 00 08 00          jmp    0x8:0x17 e: R_386_16     .text
  12:   ea 17 00 08 00          jmp    0x8:0x17 13: R_386_16    .text

00000017 <start32>:
  17:   90                      nop

llvm-objdump --mattr=+16bit-mode --x86-asm-syntax=intel -d foo.o :

00000000 <.text>:
       0: 0f b6 c0                      movzx   ax, al
       3: ea 17 00 08 00                ljmp    8, 23
       8: ea 17 00 08 00                ljmp    8, 23
       d: ea 17 00 08 00                ljmp    8, 23
      12: ea 17 00 08 00                ljmp    8, 23

00000017 <start32>:
      17: 90                            nop

And BTW, I didn't get clang 11.0 to assemble any Intel-syntax versions of this with a symbol name. ljmp 8, 12 assembles with clang, but not even ljmp 8, start32. Only by switching to AT&T syntax and back could I get clang's built-in assembler (clang -m32 -masm=intel -c) to emit a 16-bit mode far jmp.

.att_syntax
   ljmp $0x8, $start32      # working everywhere, even with clang
.intel_syntax noprefix

Keep in mind this direct form of far JMP is not available in 64-bit mode; perhaps that's why LLVM's built-in assembler appears to have spent less effort on it.


Footnote 1: Actually .intel_syntax prefix works, too, but never use that. Nobody want to see the franken-monster that is mov %eax, [%eax], or especially add %edx, %eax that's using dst, src order, but with AT&T decorated register names.

Upvotes: 3

Related Questions