David542
David542

Reputation: 110153

Defining variables in asm

Is the following a valid way to define variables in asm?

.globl main
.globl a, b, c, d

a: .byte    4
b: .value   7
c: .long    0x0C    # 11
d: .quad    9

main:
    mov     $0,         %eax
    add     a(%rip),    %eax
    add     b(%rip),    %eax
    add     c(%rip),    %eax
    add     d(%rip),    %eax
    ret

For example, is it required/suggested to have a .TEXT section? How exactly does a(%rip) resolve to the value of $4? That seems almost like magic to me.

Upvotes: 2

Views: 6461

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364160

The default section is .text; lines before any section directive assemble into the .text section. So you do have one, and in fact you put everything in it, including your data. (Or read-only constants.) Normally you should put static constants in .rodata (or .rdata on Windows), not in .text, for performance reasons. (Mixing code and data wastes space in the I-cache and D-cache, and in TLBs.)


It doesn't resolve to an immediate $4 at assemble time, it resolves to an address. In this case, using a RIP-relative addressing mode. See what does "mov offset(%rip), %rax" do? / How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more about the fact that it means "address of symbol a with respect to RIP", not actually RIP + absolute address of the symbol.

In other cases, the symbol a does generally resolve to its (32-bit) absolute address when used as add $a, %rdi or something.

It's only at runtime that the CPU loads data (which you put there with directives like .long) from that static storage. If you changed what was in memory (e.g. with a debugger, or by running other instructions) before add c(%rip), %eax executed, it would load a different value.

You put your constant data in .text, along with the code, which is generally not what you want for performance reasons. But it means the assembler can resolve the RIP-relative addressing at assemble time instead of only using a relocation that the linker has to fill in. Although it seems GAS chooses not to resolve the references and still leaves it for the linker:

$ gcc -c foo.s
$ objdump -drwC -Matt foo.o

foo.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <a>:
   0:   04                      .byte 0x4

0000000000000001 <b>:
   1:   07                      (bad)  
        ...

0000000000000003 <c>:
   3:   0c 00                   or     $0x0,%al
        ...

0000000000000007 <d>:
   7:   09 00                   or     %eax,(%rax)
   9:   00 00                   add    %al,(%rax)
   b:   00 00                   add    %al,(%rax)
        ...

000000000000000f <main>:
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   03 05 00 00 00 00       add    0x0(%rip),%eax        # 1a <main+0xb>    16: R_X86_64_PC32       a-0x4
  1a:   03 05 00 00 00 00       add    0x0(%rip),%eax        # 20 <main+0x11>   1c: R_X86_64_PC32       b-0x4
  20:   03 05 00 00 00 00       add    0x0(%rip),%eax        # 26 <main+0x17>   22: R_X86_64_PC32       c-0x4
  26:   03 05 00 00 00 00       add    0x0(%rip),%eax        # 2c <main+0x1d>   28: R_X86_64_PC32       d-0x4
  2c:   c3                      retq   

(Attempted disassembly of your data as instructions happens because you put them in .text. objdump -d only disassembles .text, and non-immediate constants are normally placed in .rodata.)

Linking it into a executable resolves those symbol references:

$ gcc -nostdlib -static foo.s    # not a working executable, just link it without extra stuff
$ objdump -drwC -Matt a.out
... (bogus data omitted)

000000000040100f <main>:
  40100f:       b8 00 00 00 00          mov    $0x0,%eax
  401014:       03 05 e6 ff ff ff       add    -0x1a(%rip),%eax        # 401000 <a>
  40101a:       03 05 e1 ff ff ff       add    -0x1f(%rip),%eax        # 401001 <b>
  401020:       03 05 dd ff ff ff       add    -0x23(%rip),%eax        # 401003 <c>
  401026:       03 05 db ff ff ff       add    -0x25(%rip),%eax        # 401007 <d>
  40102c:       c3                      retq   

Note the 32-bit little-endian 2's complement encoding of the relative offsets in the RIP+rel32 addressing modes. (And the comment with the absolute address, added by objdump for convenience in this disassembly output.)


BTW, most assemblers including GAS have macro facilities, so you could have used a = 4 or .equ a, 4 to define it as an assemble-time constant, instead of emitting data into the output there. Then you'd use it as add $a, %eax, which would assemble to an add $sign_extended_imm8, r/m32 opcode.


Also, all your loads are dword sized (determined by the register operand), so only 1 of them matches the size of the data directives you used. Single-step through your code and look at the high bits of EAX.

Assembly language doesn't really have variables. It has tools you can use to implement the high-level-language concept of variables, including variables with static storage class. (A label and some space in .data or .bss. Or .rodata for const "variables".)

But if you use the tools differently, you can do things like load 4 bytes that span the .byte, the .value (16-bit), and the first byte of the .long. So after the first instruction, you'll have EAX += 0x0c000704 (because x86 is little-endian). This is totally legal to write in assembler, and nothing is checking to enforce the concept of a variable as ending before the next label.

(Unless you use MASM, which does have variables; in that case you'd have had to write add eax, dword ptr [a]; without the size override MASM would complain about the mismatch between a dword register and a byte variable. Other flavours of asm syntax, like NASM and AT&T, assume you know what you're doing and don't try to be "helpful".)

Upvotes: 6

Related Questions