Nick
Nick

Reputation: 11

x86 NASM convert string of chars to integer

This is a simple problem but it is making my head spin. I need to convert a string of characters (input as a negative decimal number) into an unsigned integer. The rdi register holds the string to be converted. the rax register will hold the result.

    xor rsi, rsi
    xor rax, rax
    xor dl, dl
    xor rdx, rdx
convert:
    mov dl, [rdi+rsi]    ;+rsi causes segmentation fault

    cmp dl, "-"
    jz  increment

    cmp dl, "."
    jz  dtoi_end

    sub dl, "0"

    mov rdx, 10
    mul rdx

    add rax, dl          ;invalid combination

    inc rsi
    jmp convert

increment:
    inc rsi
    jmp convert

convert_end:
    ret
  1. I need to iterate over each character, and I'm trying to use this by using the rsi register. But every time I try this, I get a segmentation fault.

  2. Invalid combination error. I know this is because the registers are different sizes, but I am lost on how to keep adding in the converted ascii value back into rax.

There is a similar question here that helped me understand the process better, but I have hit a wall: Convert string to int. x86 32 bit Assembler using Nasm

Upvotes: 1

Views: 2826

Answers (1)

Cody Gray
Cody Gray

Reputation: 244981

I need to iterate over each character, and I'm trying to use this by using the rsi register. But every time I try this, I get a segmentation fault.

Based on the code you've shown, and the statement that RDI holds the address of the beginning of the string, I can see a couple of different reasons why you would be getting a segmentation fault in that load.

Perhaps the problem is that RDI contains an 8-character ASCII string (pass by value), rather than the address of a memory location that contains the string (pass by reference)?

Another more likely possibility is that it works fine the first few iterations of the loop, but then you start trying to read past the end of the string because you aren't properly terminating the loop. There is no dtoi_end label in the code you've shown, and no place where you actually jump to the convert_end label. Are these supposed to be the same label? What happens if I pass in the string "-2"? When will your loop terminate? Looks to me like it won't!

You need some way to indicate that the entire string has been processed. There are a couple of common methods. One is using a sentinel terminator character at the end of the string, like C does with the ASCII NUL character. Inside of your loop, you'd check to see if the character being processed is 0 (NUL), and if so, jump out of the loop. Another option would be to pass the length of the string as an additional parameter to the function, like Pascal does with counted-length strings. Then, you'd have a test inside of the loop that checks to see if you've processed enough characters yet, and if so, jumps out of the loop.

I'll try not to be too preachy about this, but you should have been able to detect this problem yourself by using a debugger. Step through the execution of the code line-by-line, watching the values of the variables/registers, and making sure you understand what is happening. This is basically what I did when analyzing your code, except I used my head as the debugger, "executing" the code in my own mind. It is much easier (and less error-prone) to let the computer do it, though, and that's why debuggers were invented. If you have code that isn't working, and you haven't stepped through it line-by-line in a debugger, you haven't worked hard enough to solve the problem yourself yet. In fact, single-stepping through every function you write is a good habit to get into because (A) it'll ensure that you understand the logic of what you've written, and (B) it'll help you find bugs.

Invalid combination error. I know this is because the registers are different sizes, but I am lost on how to keep adding in the converted ascii value back into rax.

You have to make the sizes match. You could do add al, dl, but then you would be limiting the result to an 8-bit BYTE. That's probably not what you want. So, you need to make dl into a 64-bit QWORD, like rax. The obvious way to do that is to use the MOVZX instruction, which does zero extension. In other words, it "extends" the value to a larger size, filling the upper bits with 0s. That's what you want for unsigned values. For signed values, you need to do a sign-aware extension (that is, take the sign bit into account), and to do that, you would use the MOVSX instruction.

In code:

movzx  rdx, dl
add    rax, rdx

Do be aware, as one of the commenters pointed out, that DL is simply the lowest 8 bits of the RDX register:

| 63 - 32 | 31 - 16 | 15 - 8 | 7 - 0 |
--------------------------------------
                    |   DH   |   DL  |
--------------------------------------
          |           EDX            |
--------------------------------------
|                 RDX                |

As such, it is redundant to xor dl, dl and xor rdx, rdx. The latter accomplishes the former. Also, every time you modify dl, you're actually modifying the lowest 8 bits of rdx, which will result in incorrect results. Hint, hint: this is something else that you'd have caught (although you might not understand why!) by single-stepping with a debugger.

Furthermore, it is unnecessary to do xor rdx, rdx at all! You can accomplish the same task, more efficiently, by doing xor edx, edx.


Just for fun, here's one possible implementation of the code:

; Parameters: RDI == address of start of character string
;             RCX == number of characters in string
; Clobbers:   RDX, RSI
; Returns:    result is in RAX

    xor   esi, esi

convert:
    ; See if we've done enough characters by checking the length of the string
    ; against our current index.
    cmp   rsi, rcx
    jge   convert_end

    ; Get the next character from the string.
    mov   dl, BYTE [rdi + rsi]

    cmp   dl, "-"
    je    increment

    cmp   dl, "."
    je    convert_end

    ; Efficient way to multiply by 10.
    ; (Faster and less difficult to write than the MUL instruction.)
    add   rax, rax
    lea   rax, [4 * rax + rax]

    sub   dl, "0"
    movzx rdx, dl
    add   rax, rdx

    ; (fall through to increment---no reason for redundant instructions!)

increment:
    inc   rsi            ; increment index/counter
    jmp   convert        ; keep looping

convert_end:
    ret

(WARNING: The logic of this is untested! I just rewrote your existing code in a more optimal way, without the bugs.)

Upvotes: 2

Related Questions