Reputation: 1
I'm reading the book Learn to program with Assembly by Jonathan Bartlett. I'm in chapter 7 on Data Record.The author introduces structs and records in Assembly. He created a simple record of a person in the file personsdata.s, where there is an array of 6 people and their characteristics: weight, hair colour, height and age.
persondata.s
.section .data
.globl people, numpeople
numpeople:
# Calculate the number of people in array
.quad (endpeople - people)/PERSON_RECORD_SIZE
people:
# Array of people
.quad 200, 2, 74, 20
.quad 280, 2, 72, 44 # me!
.quad 150, 1, 68, 30
.quad 250, 3, 75, 24
.quad 250, 2, 70, 11
.quad 180, 5, 69, 65
endpeople: # Marks the end of the array for calculation purposes
# Describe the components of the struct
.globl WEIGHT_OFFSET, HAIR_OFFSET, HEIGHT_OFFSET, AGE_OFFSET
.equ WEIGHT_OFFSET, 0
.equ HAIR_OFFSET, 8
.equ HEIGHT_OFFSET, 16
.equ AGE_OFFSET, 24
# Total size of the struct
.globl PERSON_RECORD_SIZE
.equ PERSON_RECORD_SIZE, 32
This file is only about this data record. At the end of this file there is constant PERSON_RECORD_SIZE which described what is the size in bytes of a single person is array. It's later used in a loop to go to the next person, specifically to its height. He created the program that returns a biggest height value.- tallest.s
tallest.s
.globl _start
.section .text
_start:
### Initialize Registers ###
# Pointer to first record
leaq people, %rbx
# Record count
movq numpeople, %rcx
# Tallest value found
movq $0, %rdi
### Check Preconditions ###
80Chapter 7
Data Records
# If there are no records, finish
cmpq $0, %rcx
je finish
### Main Loop ###
mainloop:
# %rbx is the pointer to the whole struct
# This instruction grabs the height field
# and stores it in %rax
movq HEIGHT_OFFSET(%rbx), %rax
# If it is less than or equal to our current
# tallest, go to the next one.
cmpq %rdi, %rax
jbe endloop
# Copy this value as the tallest value
movq %rax, %rdi
endloop:
# Move %rbx to point to the next record
addq $PERSON_RECORD_SIZE, %rbx
# Decrement %rcx and do it again
loopq mainloop
### Finish it off ###
finish:
movq $60, %rax
syscall
I understand everything in this record except one thing. We access the person's height using movq HEIGHT_OFFSET(%rbx), %rax and I understand that, but when it comes to moving to the next person and specifically the person's height, he uses addq $PERSON_RECORD_SIZE, %rbx. And here is my question. If this line is just about adding 32 to rbx to move to the next person and their height value in memory why is it using the $ sign before the constant name. I thought direct memory mode would be appropiraite here. We use it at the beginning movq numpeople, %rcx to move the number of people into the rcx module.
I've put $32 instead of $PERSON_RECORD_SIZE and it works fine. But when I put PERSON_RECORD_SIZE there is a segmentation error. I don't understand this. There seems to be some inconsistency.Is it some other addressing mode that I'm not aware of? (keep in mind I've been learning assembly for a couple of weeks now and I'm not a software engineer on a daily basis).I'm sure it's some detail I'm missing.
Upvotes: 0
Views: 39
Reputation: 363902
.equ foo, 32
defines an assemble-time constant, not assembling any bytes in the current section of the output.
Actually it defines a symbol like a foo:
label would, but "address" 32
, that's why in GAS you're able to export it with .globl foo
so it's visible to the linker in the symbol table of the .o
file.
add $foo, %rbx
adds 32, the "value" (aka address) of the symbol.
add foo, %rbx
would load from absolute address 32
, using the symbol as the address of a memory operand.
In both cases, the symbol "address" becomes part of the machine code of the instruction being assembled, the difference is only whether the opcode is one that uses it as an immediate or as a memory operand. That's also true if you do .quad foo
to emit 8 bytes with that value (address). This applies regardless of whether the symbol address aka value is labeling a position in some section, or is an integer defined with .equ
or foo = 123
(alternate syntax for the same thing).
Unfortunately when assembling add $foo, %rbx
, the assembler doesn't know the value yet (it's only a link-time constant since it's an undefined symbol in this file). So it picks add $imm32, %rbx
, making the instruction 3 bytes larger in machine code than add $32, %rbx
which would see the small value at assemble time and be able to pick the 8-bit immediate encoding. (https://www.felixcloutier.com/x86/add). For this reason, I wouldn't recommend using .equ
across asm source files. Use the C preprocessor so you can #define foo 32
in a .h
that you #include
in both files.
(The difference between a symbol labeling an address in some section vs. an absolute constant (section *UND*
) actually creates ambiguity in GNU assembler .intel_syntax noprefix
mode, but that applies even for code that uses it earlier in the same source file than the .equ
, not just across source files: see Distinguishing memory from constant in GNU as .intel_syntax. There's actually one ambiguity even in AT&T syntax, but only in a corner case that's not useful.)
The "value" of a symbol is its address. (A rough analogy is extern char foo[]
, so writing foo
is the address, but AT&T syntax dereferences bare symbol names implicitly in instructions; the analogy works better in NASM.) If there are bytes in memory at that address, you can use the symbol to assemble instructions that will at runtime access them, but you can't get the bytes into an immediate embedded in the machine code, or use it to control a .rept
or anything like that.
Variables are a high-level language concept which you can implement in assembly using labels to define symbols ahead of directives like .byte
to emit some static storage, such as bar: .byte 123
. The symbol bar
has a "value" which other instructions can refer to at assemble/link time (e.g. to generate a byte-load from 4 bytes later, like movzbl bar+4(%rip), %eax
).
An asm source line involving the symbol foo
will assemble bytes into the output that include the symbol's "address" (or things based on it, like for relative addressing, or mov $foo-bar, %eax
for distance between two symbols.)
But you can't make the assembler reference the 123
byte or any other bytes that happen to be at or near the address the symbol is attached to. Access to bytes assembled into the output can only happen at run-time. e.g. .quad bar
emits the 64-bit absolute address, and .byte bar
tries to fit the absolute address into a byte but will fail at link time (unless you had a linker script that put your .data section in the very bottom of address-space!). If you wanted to avoid repeating yourself and hard-coding 123
in multiple places, you'd need to use an assemble-time constant like .equ barval, 123
and use that in multiple places, like .byte barval
in multiple places.
In 64-bit code you'd normally never write add symbol, %reg
; the 32-bit absolute addressing mode is not efficient or useful for anything except maybe MMIO at some absolute address, not for your own .data
. You always want add symbol(%rip), %reg
RIP-relative addressing if symbol
is the address of static storage. Using symbol addresses as 32-bit absolute is useful when used with other registers as an array index, as in add my_array(,%rdx,8), %rax
or something, but see 32-bit absolute addresses no longer allowed in x86-64 Linux? (you can't do that in a PIE executable.)
See also Referencing the contents of a memory location. (x86 addressing modes) re: x86-64's selection of addressing modes.
And How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more detail about the difference in meaning for foo(%rip)
vs. 123(%rip)
.
Upvotes: 1