Reputation: 413
I have an initialized string "Hello, World!" from which I would like to extract the first character (i.e. 'H') and comapre it a character that is passed into a register at run time.
I have tried comparing the first character of "Hello, World!" with 'H' through the following code:
global start
section .data
msg: db "Hello, World!", 10, 0
section .text
start:
mov rdx, msg
mov rdi, [rdx]
mov rsi, 'H'
cmp rdi, rsi
je equal
mov rax, 0x2000001
mov rdi, [rdx]
syscall
equal:
mov rax, 0x2000001
mov rdi, 58
syscall
However, this code terminates without jumping to the equal
label. Moreover, the exit status of my program is 72
, which is the ASCII code for H
. This made me try to pass 72
into rsi
instead of H
, but that too resulted in a program the terminates without jumping to the equal
label.
How can I properly compare the first character in "Hello, World!" with a character that is passed to a register?
Upvotes: 2
Views: 3211
Reputation: 365882
You and @Rafael's answer are massively over-complicating your code.
You normally never want to use mov rdi, msg
with a 64-bit immediate of the absolute address. (See Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array)
Use default rel
and use cmp byte [msg], 'H'
. Or if you want the pointer in RDI so you can increment it in a loop, use lea rdi, [rel msg]
.
The only thing that's different between your branches is the RDI value. You don't need to duplicate the RAX setup or the syscall
, just get the right value in RDI and then have the branches rejoin each other. (Or do it branchlessly.)
@Rafael's answer is still loading 8 bytes from the string for some reason, like both loads in your question. Presumably this is sys_exit
and it ignores the upper bytes, only setting process exit status from the low byte, but just for fun let's pretend we actually want all 8 bytes loaded for the syscall while only comparing the low byte.
default rel ; use RIP-relative addressing modes by default for [label]
global start
section .rodata ;; read-only data usually belongs in .rodata
msg: db "Hello, World!", 10, 0
section .text
start:
mov rdi, [msg] ; 8 byte load from a RIP-relative address
mov ecx, 'H'
cmp dil, cl ; compare the low byte of RDI (dil) with the low byte of RCX (cl)
jne .notequal
;; fall through on equal
mov edi, 58
.notequal: ; .labels are local labels in NASM
; mov rdi, [rdx] ; still loaded from before; we didn't destroy it.
mov eax, 0x2000001
syscall
Avoid writing to AH/BH/CH/DH when possible. It either has a false dependency on the old value of RAX/RBX/RCX/RDX, or it can cause partial-register merging stalls if you later read the full register. @Rafael's answer doesn't do that, but the mov ah, 'H'
is dependent on the load into AL on some CPUs. See Why doesn't GCC use partial registers? and How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent - mov ah, 'H'
has a false dependency on the old value of AH on Haswell/Skylake, even though AH is renamed separately from RAX. But AL isn't, so yes, this might well have a false dependency on the load, stopping it from running in parallel and delaying the cmp
by a cycle.
Anyway, the TL:DR here is that you shouldn't mess around with writing AH/BH/CH/DH if you don't need to. Reading them is often ok, but can have worse latency. And note that cmp dil, ah
isn't encodeable, because DIL is only accessible with a REX prefix and AH is only accessible without.
I picked RCX instead of RSI because CL doesn't need a REX prefix, but since we need to look at the low byte of RDI (dil) we need a REX prefix anyway on the cmp. I could have use mov cl, 'H'
to save code-size, because there's probably no problem with a false dependency on the old value of RCX.
BTW, cmp dil, 'H'
would work just as well as cmp dil, cl
.
Or if we load the byte with zero-extension into the full RDI, we can use cmp edi, 'H'
instead of the low-8 version of it. (Zero-extending loads are the normal / recommended way to deal with bytes and 16-bit integers on modern x86-64. Merging into the low byte of the old register value is usually worse for performance, which is the reason Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?.)
And instead of branching, we could CMOV. This is sometimes better, sometimes not, for code-size and performance.
Version 2, only actually loading 1 byte:
start:
movzx edi, byte [msg] ; 1 byte load, zero extended to 4 (and implicitly to 8)
mov eax, 58 ; ASCII ':'
cmp edi, 'H'
cmove edi, eax ; edi = (edi == 'H') ? 58 : edi
; rdi = 58 or the first byte,
; unlike in the other version where it had 8 bytes of string data here
mov eax, 0x2000001
syscall
(This version looks a lot shorter, but most of the extra lines were whitespace, comments, and labels. Optimizing to cmp
-immediate makes this 4 instructions instead of 5 before the mov eax
/ syscall
, but other than that they're equal.)
Upvotes: 4
Reputation: 7746
I'll explain the changes side-by-side (hopefully that's easier to follow):
global start
section .data
msg: db "Hello, World!", 10, 0
section .text
start:
mov rdx, msg
mov al, [rdx] ; moves one byte from msg, H to al, the 8-bit lower part of ax
mov ah, 'H' ; move constant 'H' to the 8-bit upper part of ax
cmp al, ah ; compares H with H
je equal ; yes, they are equal, so go to address at equal
mov rax, 0x2000001
mov rdi, [rdx]
syscall
equal: ; here we are
mov rax, 0x2000001
mov rdi, 58
syscall
If you're not understanding the use / mention of al
, ah
, ax
, please see General-Purpose Registers.
Upvotes: 1