Obi Won Kanobi
Obi Won Kanobi

Reputation: 11

How do I compare two strings in assembly (nasm)

REVISED CODE: Is there a way to optimize this?

read_pass:
    ;read passowrd

    ; read(int fd, void *buf, size_t count);
    ; #define __NR_read 0
    ; rdi = unsigned int fd
    ; rsi = char *buf
    ; rdx = size_t count
    xor rax, rax
    mov rdi, rax
    mov rsi, userpass
    mov rdx, rax
    add rdx, 0x64 ; 100 
    syscall

    lea rdi, [passcode]
    lea rsi, [userpass]
    mov rcx, pclen

    repe cmpsb 
    je do_something

    jmp read_pass

section .data
    passcode db 'hi', 0xa  
    pclen equ $ - passcode 
    userpass times 100 db 0
    uplen equ $ - userpass

ORIGINAL Q that the comments are replying to was asking about similar code, but using incorrect operands to cmps:

How do I compare two strings in assembly (nasm)?

When I compile I get the following error:
Pass.nasm:129: error: invalid combination of opcode and operands

And line 129 is:
cmpsq userpass, passcode

( I also tried cmp and cmps)

Upvotes: 0

Views: 5345

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 363980

Since this is a read/check password function, optimizing for speed on repeated calls makes no sense. Optimizing for code size (and lack of any major stalls on the first run) is the way to go, to minimize cache pollution (esp. the small and very valuable uop-cache). See http://agner.org/optimize/ (and some of the other resources linked from https://stackoverflow.com/tags/x86/info) for excellent info.

I did find some bugs / security holes in your code, and ways to save bytes. Also, keeping your read buffer on the stack would save 100 bytes of BSS space. See below.

It looks like you want to hard-code read(2) from stdin (fd 0), into a buffer of length 100. If you actually read only 99 characters, your string will still be zero-terminated, so I'd suggest doing that.

Loading addresses of global variables / arrays into registers in AMD64 is best done with mov r32, imm32, according to gcc/clang/icc. A RIP-relative lea is the best choice if you don't know that the address fits in the low32 bits of virtual memory, or if you need to make position-independent code. Data-section addresses are in the low32 in the Linux x86_64 programming model, so the 5-byte mov r32, imm32 works. mov r64, imm32 sign-extends the 32bit value. We don't want that, and it takes a REX prefix byte, so it's actually better (but more confusing to read) to load known-32bit addresses into 32bit registers. Obviously arbitrary addresses will be truncated if you do this. If unsure, use lea r64, [rel addr], and of course always use 64bit operand-size when working with addresses as function arguments or w/e.

If you do need to deal with 64bit addresses for globals, its maybe worth just loading it once, then saving/restoring it across the system call (in another a register it won't clobber, or actually push/pop since I think system calls clobber all the caller-saved registers. i.e. if we used rbx, we'd have to push/pop the caller's rbx at the start/end of our function, because it's a callee-saved register.

    xor eax, eax                ;  writing a 32bit reg always zeros the upper32, and saves a REX prefix byte
    xor edi, edi                ; read(fd 0)
    mov esi, userpass           ; lea rsi, [rel userpass]
    lea edx, [rax + uplen - 1]  ;  shorter and harder for humans to read than mov edx, uplen - 1
    syscall

    ; continued below

section .rodata
    ; passcode can be part of the shared read-only mapping of the executable, not copy-on-write.
    passcode db 'hi', 0xa    ; it's not normal to include the newline in the password, but it does make the code simpler I guess
    pclen equ $ - passcode

section .data
    userpass times 100 db 0
    uplen equ $ - userpass

Zeroing by moving from a zeroed register is also a 2-byte instruction, like xor. It might have a slight advantage on AMD CPUs, where it can run on more execution ports. On Intel, Sandybridge handles xor same,same in the register-rename stage, not using an execution unit at all, and giving it a throughput of 4 per clock. IDK if AMD will ever pick up this trick. It's not until IvyBridge that mov reg,reg is also handled at the register-rename stage of the pipeline, and also doesn't need an execution unit. Probably not measurable difference either way, since it's at the start of a short dependency chain, so I'd prefer the xor-zeroing just to make it easier to read (i.e. you don't have to remember that eax was zeroed when looking at xor edi,edi.)

To get the buffer length into a register, it might technically be shorter code to zero a reg and then add reg,imm8, but that's 2 Intel uops / AMD macro-ops, instead of just one for a mov reg, imm32 which is only one byte longer. (Thanks to automatic zeroing of the upper32 when writing a 32bit reg.) Actually, a decent way to save 2 bytes would be lea edx, [rax + uplen - 1], where rax is a reg that you just zeroed. lea with a signed-8bit displacement only takes 3 bytes to encode. In long mode, the default operand size is 32bit and the default address size is 64bit, which is why 32bit dest register and an addressing mode using 64bit register(s) is the most compact. Sometimes looking at objdump -d /bin/ls or something is the quickest way to check how many bytes it takes for a certain kind of instruction encoding, if you know what the rules are that make the instruction you want the same length as something you can find using other registers in a similar way.


Now let's look at your actual password-check code. First of all, it's normal to only store hashes of passwords, not the plaintext passwords themselves. Anyone considering actually using this code for any non-toy use should stop reading and go look that up. The more you can re-use well-tested libraries, the less risk of overlooking a security hole.

; continuing from above:
; ... syscall

test eax, eax       ; read(2) result in eax
jle  EOF_or_error   ; In C, most of the code in systems programming is checking for errors.

; lea rdi, [passcode]
; lea rsi, [userpass]
; If you use lea, make sure you use RIP-rel, because 64bit absolute addressing is only available for mov rax, [addr64].

mov edi, passcode   ; 5 bytes, see above discussion of loading addresses.
lea rsi, [rdi + userpass - passcode]  ; This is only 4 bytes.  3 bytes if dest is esi, not rsi. (no REX needed).

mov ecx, pclen   ; we know pclen < uplen, so this can't buffer overflow, but see text for security problems from not looking at length of read
repe cmpsb 
jne read_pass
; fall through to do_something, or to a ret insn.  Saves a jmp

This looks pretty reasonable now. Including the newline in the password is what lets you get away with not even checking the length of the password you read. You do need to check that you read something, or else you might just be comparing the password against the previous correct input, if read didn't touch any bytes in your buffer.

Actually, with the tty in line-buffered "cooked" input mode, read(2) returns what's been typed so far when you press ctrl-d (EOF), even when it doesn't include a newline. A subsequent call to read will read more. So you need to worry about that, as well as interrupted system calls (e.g. by a signal). This is one of the things library I/O functions handle for you.

Try with cat: you can type some characters, and then have them echoed without a newline by pressing ctrl-d. So this password routine has a huge security hole: If the previous correct password is sitting there in the buffer, all I have to do is guess the first character. I can make repeated guesses, just pressing ctrl-d after every character.

You'd avoid this problem if you zeroed the buffer (with mov eax, ecx / xor eax,eax / rep stosb, where eax is the read return value that you've checked is >= 0). That wipes the old password entry from memory as soon as its been checked. Of course, the correct password is just sitting there in plain text. If you don't care about passwords sitting around in memory, you could just check the number of characters read against length of the correct password.

; not shown: check for EOF/error

mov ecx, pclen
cmp ecx, eax    ; check lengths to avoid EOF first-char guessing
jne read_pass

; not shown: set up addresses

repe cmpsb      ; check contents
jne read_pass

; They match, do whatever here.

I don't see a clever way to only use one test or cmp instruction to check for zero / negative return value and check that


Another point: the password input buffer could be on the stack. If this code doesn't have to run unchanged on Windows, you don't even have to futz with RSP, you can just use the red-zone below the current stack pointer, which signal handlers won't clobber. Then you aren't wasting 100 bytes permanently for a buffer that's only used during password input. Since I've shown you should really be checking the return value of read anyway, the old contents don't matter, whether you have it on the stack or from malloc.


For speed on repeated strcmp calls, the startup overhead of rep cmpsb may make it worse than a normal loop for short strings. For memset/memcpy, I think the threshold where rep stos / rep movs are faster than an optimized SSE loop is about 128B or so, on Intel CPUs with Fast String Operations (IvB and later).

Upvotes: 1

Related Questions