fuz
fuz

Reputation: 93172

Why does the Solaris assembler generate different machine code than the GNU assembler here?

I wrote this little assembly file for amd64. What the code does is not important for this question.

        .globl fib

fib:    mov %edi,%ecx
        xor %eax,%eax
        jrcxz 1f
        lea 1(%rax),%ebx

0:      add %rbx,%rax
        xchg %rax,%rbx
        loop 0b

1:      ret

Then I proceeded to assemble and then disassemble this on both Solaris and Linux.

Solaris

$ as -o y.o -xarch=amd64 -V y.s                            
as: Sun Compiler Common 12.1 SunOS_i386 Patch 141858-04 2009/12/08
$ dis y.o                                                  
disassembly for y.o


section .text
    0x0:                    8b cf              movl   %edi,%ecx
    0x2:                    33 c0              xorl   %eax,%eax
    0x4:                    e3 0a              jcxz   +0xa      <0x10>
    0x6:                    8d 58 01           leal   0x1(%rax),%ebx
    0x9:                    48 03 c3           addq   %rbx,%rax
    0xc:                    48 93              xchgq  %rbx,%rax
    0xe:                    e2 f9              loop   -0x7      <0x9>
    0x10:                   c3                 ret    

Linux

$ as --64 -o y.o -V y.s
GNU assembler version 2.22.90 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.22.90.20120924
$ objdump -d y.o

y.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <fib>:
   0:   89 f9                   mov    %edi,%ecx
   2:   31 c0                   xor    %eax,%eax
   4:   e3 0a                   jrcxz  10 <fib+0x10>
   6:   8d 58 01                lea    0x1(%rax),%ebx
   9:   48 01 d8                add    %rbx,%rax
   c:   48 93                   xchg   %rax,%rbx
   e:   e2 f9                   loop   9 <fib+0x9>
  10:   c3                      retq   

How comes the generated machine code is different? Sun as generates 8b cf for mov %edi,%ecx while gas generates 89 f9 for the very same instruction. Is this because of the various ways to encode the same instruction under x86 or do these two encodings really have a particular difference?

Upvotes: 5

Views: 509

Answers (2)

FrankH.
FrankH.

Reputation: 18247

You've not specified the operand size for the mov, xor and add operations. This creates some ambiguity. The GNU assembler manual, i386 Mnemonics, mentions this:

If no suffix is specified by an instruction then as tries to fill in the missing suffix based on the destination register operand (the last one by convention). [ ... ] . Note that this is incompatible with the AT&T Unix assembler which assumes that a missing mnemonic suffix implies long operand size.

This implies the GNU assembler chooses differently - it'll pick the opcode with the R/M byte specifying the target operand (because the destination size is known/implied) while the AT&T one chooses the opcode where the R/M byte specifies the source operand (because the operand size is implied).

I've done that experiment though and given explicit operand sizes in your assembly source, and it doesn't change the GNU assembler output. There is, though, the other part of the documentation above,

Different encoding options can be specified via optional mnemonic suffix. `.s' suffix swaps 2 register operands in encoding when moving from one register to another.

which one can use; the following sourcecode, with GNU as, creates me the opcodes you got from Solaris as:

.globl fib

fib:    movl.s %edi,%ecx
        xorl.s %eax,%eax
        jrcxz 1f
        leal 1(%rax),%ebx

0:      addq.s %rbx,%rax
        xchgq %rax,%rbx
        loop 0b

1:      ret

Upvotes: 1

Drew McGowen
Drew McGowen

Reputation: 11716

Some x86 instructions have multiple encodings that do the same thing. In particular, any instruction that acts on two registers can have the registers swapped and the direction bit in the instruction reversed.

Which one a given assembler/compiler picks simply depends on what the tool authors chose.

Upvotes: 7

Related Questions