Reputation: 26374
So I'm learning x86_64 nasm assembly on my mac for fun. After hello world and some basic arithmetic, I tried copying a slightly more advanced hello world program from this site and modifying it for 64 bit intel, but I can't get rid of this one error message: hello.s:53: error: Mach-O 64-bit format does not support 32-bit absolute addresses
. Here is the command I use to assemble and link: nasm -f macho64 hello.s && ld -macosx_version_min 10.6 hello.o
. And here is the relevant line:
cmp rsi, name+8
rsi is the register I am using for my index in the loop, and name is a quad word reserved for the user input which is the name, which by this point has already been written.
Here is a part of the code (to see the rest, click the link and go to the bottom, the only difference is that I use 64 bit registers):
loopAgain:
mov al, [rsi] ; al is a 1 byte register
cmp al, 0x0a ; if al holds an ascii newline...
je exitLoop ; then jump to label exitLoop
; If al does not hold an ascii newline...
mov rax, 0x2000004 ; System call write = 4
mov rdi, 1 ; Write to stdout = 1
mov rdx, 1 ; Size to write
syscall
inc rsi
cmp rsi, name+8 ; LINE THAT CAUSES ERROR
jl loopAgain
Upvotes: 6
Views: 3880
Reputation: 231203
The cmp
instruction does not support a 64-bit immediate operand. As such, you cannot put a 64-bit immediate address reference in one of its operands - load name+8
into a register then compare to that register.
You can see what instruction encodings are permitted in the Intel ISA manual (warning: huge PDF). As you can see on the entry for CMP, there are CMP r/m32,
imm32
and CMP r/m64,
imm32
encodings, which allow for comparisons of a 32-bit immediate with both 32-bit and 64-bit registers, but not a CMP r/m64, imm64
. There is, however, a MOV r64, imm64
encoding.
Or even better, use a RIP-relative LEA: Use default rel
then lea r64, [name+8]
. This is more efficient and smaller than mov r64, imm64
.
Since nasm is crashing, the failure of MOV rcx, name+8
is just plain a bug in nasm. Please report it to the nasm devs (after making sure you're using the latest version of nasm; also, check that this patch doesn't fix the problem). In any case, though, one workaround would be to add a symbol for the end of name
:
name:
resb 8
name_end:
Now simply use MOV rcx, name_end
. This has the advantage of not needing to update the referents when the size of name
changes. Alternately you could use a different assembler, such as the clang or GNU binutils assemblers.
Discussion in comments points out that Linux can use a symbol address as a 32-bit immediate. This is true only in non-PIE executables which are linked with a base address in the low 2GiB of virtual address space. But MacOS chooses to put the image base address above 4GiB so you can't use mov r32, imm32
or cmp r64, sign_extended_imm32
with symbol addresses.
Upvotes: 4
Reputation: 213378
I believe the problem you are facing is simple: the Mach-O format mandates relocatable code, which means that the data has to be accessed not by absolute address but by a relative address. That is, the assembler can't resolve name
to a constant because it's not a constant, the data could be at any address.
Now that you know that the address of data is relative to the address of your code, see if you can make sense of the output from GCC. For example,
static unsigned global_var;
unsigned inc(void)
{
return ++global_var;
}
_inc:
mflr r0 ; Save old link register
bcl 20,31,"L00000000001$pb" ; Jump
"L00000000001$pb":
mflr r10 ; Get address of jump
mtlr r0 ; Restore old link register
addis r2,r10,ha16(_global_var-"L00000000001$pb") ; Add offset to address
lwz r3,lo16(_global_var-"L00000000001$pb")(r2) ; Load global_var
addi r3,r3,1 ; Increment global_var
stw r3,lo16(_global_var-"L00000000001$pb")(r2) ; Store global_var
blr ; Return
Note that this is on PowerPC, because I don't know the Mach-O ABI for x86-64. On PowerPC, you do a jump, saving the program counter, and then do arithmetic on the result. I believe something completely different happens on x86-64.
(Note: If you look at GCC's assembly output, try looking at it with -O2
. I don't bother looking at -O0
because it's too verbose and more difficult to understand.)
My recommendation? Unless you are writing a compiler (and sometimes even then), write your assembly functions in one of two ways:
This will generally be more portable as well, since you will rely less on certain details of the ABI. But the ABI is still important! If you don't know the ABI and follow it, then you'll cause errors that are fairly difficult to detect. For instance, years ago there was a bug in LibSDL assembly code which caused libc's memcpy
(also assembly) to copy the wrong data under some very specific circumstances.
Upvotes: 3