Reputation: 11
This is a simple problem but it is making my head spin. I need to convert a string of characters (input as a negative decimal number) into an unsigned integer. The rdi register holds the string to be converted. the rax register will hold the result.
xor rsi, rsi
xor rax, rax
xor dl, dl
xor rdx, rdx
convert:
mov dl, [rdi+rsi] ;+rsi causes segmentation fault
cmp dl, "-"
jz increment
cmp dl, "."
jz dtoi_end
sub dl, "0"
mov rdx, 10
mul rdx
add rax, dl ;invalid combination
inc rsi
jmp convert
increment:
inc rsi
jmp convert
convert_end:
ret
I need to iterate over each character, and I'm trying to use this by using the rsi register. But every time I try this, I get a segmentation fault.
Invalid combination error. I know this is because the registers are different sizes, but I am lost on how to keep adding in the converted ascii value back into rax.
There is a similar question here that helped me understand the process better, but I have hit a wall: Convert string to int. x86 32 bit Assembler using Nasm
Upvotes: 1
Views: 2826
Reputation: 244981
I need to iterate over each character, and I'm trying to use this by using the rsi register. But every time I try this, I get a segmentation fault.
Based on the code you've shown, and the statement that RDI
holds the address of the beginning of the string, I can see a couple of different reasons why you would be getting a segmentation fault in that load.
Perhaps the problem is that RDI
contains an 8-character ASCII string (pass by value), rather than the address of a memory location that contains the string (pass by reference)?
Another more likely possibility is that it works fine the first few iterations of the loop, but then you start trying to read past the end of the string because you aren't properly terminating the loop. There is no dtoi_end
label in the code you've shown, and no place where you actually jump to the convert_end
label. Are these supposed to be the same label? What happens if I pass in the string "-2"? When will your loop terminate? Looks to me like it won't!
You need some way to indicate that the entire string has been processed. There are a couple of common methods. One is using a sentinel terminator character at the end of the string, like C does with the ASCII NUL character. Inside of your loop, you'd check to see if the character being processed is 0 (NUL), and if so, jump out of the loop. Another option would be to pass the length of the string as an additional parameter to the function, like Pascal does with counted-length strings. Then, you'd have a test inside of the loop that checks to see if you've processed enough characters yet, and if so, jumps out of the loop.
I'll try not to be too preachy about this, but you should have been able to detect this problem yourself by using a debugger. Step through the execution of the code line-by-line, watching the values of the variables/registers, and making sure you understand what is happening. This is basically what I did when analyzing your code, except I used my head as the debugger, "executing" the code in my own mind. It is much easier (and less error-prone) to let the computer do it, though, and that's why debuggers were invented. If you have code that isn't working, and you haven't stepped through it line-by-line in a debugger, you haven't worked hard enough to solve the problem yourself yet. In fact, single-stepping through every function you write is a good habit to get into because (A) it'll ensure that you understand the logic of what you've written, and (B) it'll help you find bugs.
Invalid combination error. I know this is because the registers are different sizes, but I am lost on how to keep adding in the converted ascii value back into rax.
You have to make the sizes match. You could do add al, dl
, but then you would be limiting the result to an 8-bit BYTE. That's probably not what you want. So, you need to make dl
into a 64-bit QWORD, like rax
. The obvious way to do that is to use the MOVZX
instruction, which does zero extension. In other words, it "extends" the value to a larger size, filling the upper bits with 0s. That's what you want for unsigned values. For signed values, you need to do a sign-aware extension (that is, take the sign bit into account), and to do that, you would use the MOVSX
instruction.
In code:
movzx rdx, dl
add rax, rdx
Do be aware, as one of the commenters pointed out, that DL
is simply the lowest 8 bits of the RDX
register:
| 63 - 32 | 31 - 16 | 15 - 8 | 7 - 0 |
--------------------------------------
| DH | DL |
--------------------------------------
| EDX |
--------------------------------------
| RDX |
As such, it is redundant to xor dl, dl
and xor rdx, rdx
. The latter accomplishes the former. Also, every time you modify dl
, you're actually modifying the lowest 8 bits of rdx
, which will result in incorrect results. Hint, hint: this is something else that you'd have caught (although you might not understand why!) by single-stepping with a debugger.
Furthermore, it is unnecessary to do xor rdx, rdx
at all! You can accomplish the same task, more efficiently, by doing xor edx, edx
.
Just for fun, here's one possible implementation of the code:
; Parameters: RDI == address of start of character string
; RCX == number of characters in string
; Clobbers: RDX, RSI
; Returns: result is in RAX
xor esi, esi
convert:
; See if we've done enough characters by checking the length of the string
; against our current index.
cmp rsi, rcx
jge convert_end
; Get the next character from the string.
mov dl, BYTE [rdi + rsi]
cmp dl, "-"
je increment
cmp dl, "."
je convert_end
; Efficient way to multiply by 10.
; (Faster and less difficult to write than the MUL instruction.)
add rax, rax
lea rax, [4 * rax + rax]
sub dl, "0"
movzx rdx, dl
add rax, rdx
; (fall through to increment---no reason for redundant instructions!)
increment:
inc rsi ; increment index/counter
jmp convert ; keep looping
convert_end:
ret
(WARNING: The logic of this is untested! I just rewrote your existing code in a more optimal way, without the bugs.)
Upvotes: 2