Objdump of .code16 and .code32 x86 assembly

Question

I have this assembly code (on Linux):

.globl _start
_start:
  cli                         

  xorw    %ax,%ax             # Set %ax to zero
  movw    %ax,%ds             
  movw    %ax,%es             
  movw    %ax,%ss

I first add .code16 at the top to generate a 16-bit code and then replace that with .code32 to generate a 32-bit code. I compile these with these two commands:

gcc -m32 -nostdinc -c file.s
ld -m elf_i386 -o file.exe file.o

And, then I examine with

objdump -d file.exe

For the first case (.code16) I get this output:

08048054 <_start>:
 8048054:   fa                      cli    
 8048055:   31 c0                   xor    %eax,%eax
 8048057:   8e d8                   mov    %eax,%ds
 8048059:   8e c0                   mov    %eax,%es
 804805b:   8e d0                   mov    %eax,%ss

For the second case (.code32) I get this output:

08048054 <_start>:
 8048054:   fa                      cli    
 8048055:   66 31 c0                xor    %ax,%ax
 8048058:   8e d8                   mov    %eax,%ds
 804805a:   8e c0                   mov    %eax,%es
 804805c:   8e d0                   mov    %eax,%ss

I understand the 66 operand prefix part. What confuses me is the assembly mnemonics printed. Shouldn't xor %eax, %eax be printed for the .code32 case as well? Or, should it be printing xor %ax, %ax for .code16 case? Can somebody please clarify?

Peter Cordes · Accepted Answer

.code 16 tells the assembler to assume the code will be run in 16-bit mode, e.g. to use the 66 operand-size prefix for 32-bit operand-size instead of the default 16. However, you assemble and link it into an elf32 binary, which means the file metadata still indicates 32-bit code. (There's no such thing as an x86-16 Linux ELF file).

Objdump disassembles according to the file metadata, thus as 32-bit code, unless you override with -m i8086. The sizes you're getting match the binary for 32-bit disassembly.

You'll probably actually see breakage if you assemble an instruction that has a different length in 16bit mode, like

add  $129,  %ax  # 129 doesn't fit in an imm8

If assembled as a 16bit instruction, it will have no prefix, and an imm16 source operand. Decoded as a 32bit instruction, it will have an imm32 source operand, which takes more total bytes following the opcode. An operand-size prefix would change the length of the rest of the instruction (not including prefixes), for either mode. BTW, (pre-)decoding slows down on Intel CPUs for this special case where a prefix is length-changing for the rest of the instruction. (https://agner.org/optimize/)

Anyway, disassembling that instruction with the wrong code size will lead to the disassembler getting out of sync with instruction boundaries, so it will definitively test what mode it's being interpreted in.

If you're making normal user-space code (not a kernel that switches modes, or needs to be 16-bit), .code32 and .code64 are useless. They just let you put the machine code into the wrong kind of ELF file. (Assembling 32-bit binaries on a 64-bit system (GNU toolchain))

BTW, moving to %ss implicitly prevents interrupts until after the next instruction. (Which should set the stack pointer). You can avoid cli/sti that way.

Objdump of .code16 and .code32 x86 assembly

Answers (1)

Related Questions