Hritik
Hritik

Reputation: 722

Offset before square bracket in x86 intel asm on GCC

From all the docs I've found, there is no mention of syntax like offset[var+offset2] in Intel x86 syntax but GCC with the following flags

gcc -S hello.c -o - -masm=intel

for this program

#include<stdio.h>
int main(){
    char c = 'h';
    putchar(c);
    return 0;
}

produces

    .file   "hello.c"
    .intel_syntax noprefix
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    push    rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    mov rbp, rsp
    .cfi_def_cfa_register 6
    sub rsp, 16
    mov BYTE PTR -1[rbp], 104
    movsx   eax, BYTE PTR -1[rbp]
    mov edi, eax
    call    putchar@PLT
    mov eax, 0
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Arch Linux 9.3.0-1) 9.3.0"
    .section    .note.GNU-stack,"",@progbits

I'd like to highlight the line mov BYTE PTR -1[rbp], 104 where offset -1 appears outside the square brackets. TBH, I'm just guessing that it is an offset, can anyone direct me to a proper documentation highlighting this ?

Here is a similar question: Squared Brackets in x86 asm from IDA where a comment does mention that it is an offset but I'd really like a proper documentation reference.

Upvotes: 3

Views: 942

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365737

Yes, it's just another way of writing [rbp - 1], and the -1 is a displacement in technical x86 addressing mode terminology1.

The GAS manual's section on x86 addressing modes only mentions the [ebp - 4] possibility, not -4[ebp], but GAS does assemble it.

And disassembly in AT&T or Intel syntax confirms what it meant. x86 addressing modes are constrained by what the machine can encode (Referencing the contents of a memory location. (x86 addressing modes)), so there isn't a lot of wiggle room on what some syntax might mean. (This syntax was emitted by GCC so we can safely assume that it's valid. And that it means the same thing as the -1(%rbp) it emits in AT&T syntax mode.)

Footnote 1: The whole rbp-1 effective address is the offset part of a seg:off address. The segment base is fixed at 0 in 64-bit mode, except for FS and GS, and even in 32-bit mode mainstream OSes use a flat memory model, so you can ignore the segment base. I point this out only because "offset" in x86 terminology does have a specific technical meaning separate from "displacement", in case you care about using terminology that matches Intel's manuals.


For some reason GCC's choice of syntax depends on -fno-pie or not. https://godbolt.org/z/iK9jh6 (On modern GNU/Linux distros like your Arch system, -fpie is enabled by default. On Godbolt it isn't).

This choice continues with optimization enabled, if you use volatile to force the stack variable to be written, or do other stuff with pointers: e.g. https://godbolt.org/z/4P92Fk. It applies to arbitrary dereferences like ptr[1 + x] from function args.

  • GCC -fno-pie chooses [rbp - 1] and [rdi+4+rsi*4]
  • GCC -fpie chooses -1[rbp] and 4[rdi+rsi*4]

IDK why GCC's internals choose differently based on PIE mode. No obvious reason; perhaps for some reason they just use different code paths in GCC's internals, or different format strings and they just happen to make different choices.

Both with and without PIE, a global (static storage) is referenced as glob[rip], not [RIP + glob] which is also supported. In both cases that means glob with respect to RIP, not actually RIP + absolute address of the symbol. But that's an exception to the rule that applies for any other register, or for no register.


GAS .intel_syntax is MASM-like, and MASM certainly does support symbol[register] and I think even 1234[register]. It's more normal for the displacement.

Upvotes: 5

Related Questions