Reputation: 110153
Is the following a valid way to define variables in asm
?
.globl main
.globl a, b, c, d
a: .byte 4
b: .value 7
c: .long 0x0C # 11
d: .quad 9
main:
mov $0, %eax
add a(%rip), %eax
add b(%rip), %eax
add c(%rip), %eax
add d(%rip), %eax
ret
For example, is it required/suggested to have a .TEXT
section? How exactly does a(%rip)
resolve to the value of $4
? That seems almost like magic to me.
Upvotes: 2
Views: 6461
Reputation: 364160
The default section is .text
; lines before any section directive assemble into the .text
section. So you do have one, and in fact you put everything in it, including your data. (Or read-only constants.) Normally you should put static constants in .rodata
(or .rdata
on Windows), not in .text
, for performance reasons. (Mixing code and data wastes space in the I-cache and D-cache, and in TLBs.)
It doesn't resolve to an immediate $4
at assemble time, it resolves to an address. In this case, using a RIP-relative addressing mode. See what does "mov offset(%rip), %rax" do? / How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more about the fact that it means "address of symbol a
with respect to RIP", not actually RIP + absolute address of the symbol.
In other cases, the symbol a
does generally resolve to its (32-bit) absolute address when used as add $a, %rdi
or something.
It's only at runtime that the CPU loads data (which you put there with directives like .long
) from that static storage. If you changed what was in memory (e.g. with a debugger, or by running other instructions) before add c(%rip), %eax
executed, it would load a different value.
You put your constant data in .text
, along with the code, which is generally not what you want for performance reasons. But it means the assembler can resolve the RIP-relative addressing at assemble time instead of only using a relocation that the linker has to fill in. Although it seems GAS chooses not to resolve the references and still leaves it for the linker:
$ gcc -c foo.s
$ objdump -drwC -Matt foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 04 .byte 0x4
0000000000000001 <b>:
1: 07 (bad)
...
0000000000000003 <c>:
3: 0c 00 or $0x0,%al
...
0000000000000007 <d>:
7: 09 00 or %eax,(%rax)
9: 00 00 add %al,(%rax)
b: 00 00 add %al,(%rax)
...
000000000000000f <main>:
f: b8 00 00 00 00 mov $0x0,%eax
14: 03 05 00 00 00 00 add 0x0(%rip),%eax # 1a <main+0xb> 16: R_X86_64_PC32 a-0x4
1a: 03 05 00 00 00 00 add 0x0(%rip),%eax # 20 <main+0x11> 1c: R_X86_64_PC32 b-0x4
20: 03 05 00 00 00 00 add 0x0(%rip),%eax # 26 <main+0x17> 22: R_X86_64_PC32 c-0x4
26: 03 05 00 00 00 00 add 0x0(%rip),%eax # 2c <main+0x1d> 28: R_X86_64_PC32 d-0x4
2c: c3 retq
(Attempted disassembly of your data as instructions happens because you put them in .text
. objdump -d
only disassembles .text
, and non-immediate constants are normally placed in .rodata
.)
Linking it into a executable resolves those symbol references:
$ gcc -nostdlib -static foo.s # not a working executable, just link it without extra stuff
$ objdump -drwC -Matt a.out
... (bogus data omitted)
000000000040100f <main>:
40100f: b8 00 00 00 00 mov $0x0,%eax
401014: 03 05 e6 ff ff ff add -0x1a(%rip),%eax # 401000 <a>
40101a: 03 05 e1 ff ff ff add -0x1f(%rip),%eax # 401001 <b>
401020: 03 05 dd ff ff ff add -0x23(%rip),%eax # 401003 <c>
401026: 03 05 db ff ff ff add -0x25(%rip),%eax # 401007 <d>
40102c: c3 retq
Note the 32-bit little-endian 2's complement encoding of the relative offsets in the RIP+rel32 addressing modes. (And the comment with the absolute address, added by objdump for convenience in this disassembly output.)
BTW, most assemblers including GAS have macro facilities, so you could have used a = 4
or .equ a, 4
to define it as an assemble-time constant, instead of emitting data into the output there. Then you'd use it as add $a, %eax
, which would assemble to an add $sign_extended_imm8, r/m32
opcode.
Also, all your loads are dword sized (determined by the register operand), so only 1 of them matches the size of the data directives you used. Single-step through your code and look at the high bits of EAX.
Assembly language doesn't really have variables. It has tools you can use to implement the high-level-language concept of variables, including variables with static storage class. (A label and some space in .data
or .bss
. Or .rodata
for const
"variables".)
But if you use the tools differently, you can do things like load 4 bytes that span the .byte
, the .value
(16-bit), and the first byte of the .long
. So after the first instruction, you'll have EAX += 0x0c000704
(because x86 is little-endian). This is totally legal to write in assembler, and nothing is checking to enforce the concept of a variable as ending before the next label.
(Unless you use MASM, which does have variables; in that case you'd have had to write add eax, dword ptr [a]
; without the size override MASM would complain about the mismatch between a dword register and a byte variable. Other flavours of asm syntax, like NASM and AT&T, assume you know what you're doing and don't try to be "helpful".)
Upvotes: 6