Reputation: 13
This is a program that gets passed a String as input.
I'm confused with the assembler code shown below, specifically line 6. This is what i understood from my research:
rbp-48
is a pointer that points to the stack address where argv
is stored. (argv
itself, is the address pointing to the start of the argv
array)argv
array address.argv[1]
. (I understand there is another address stored inside argv[1]
that points to a string).And now is were I get confused: If I wanted to access the first character in argv[1]
and store it in eax register, I would expect assembler to do something like:
mov eax, BYTE PTR [rdx]
And if I need to access the second character stored in argv[1] and store it in eax register, I would expect assembler to do something like:
mov eax, BYTE PTR [rdx+1]
But instead, I see the compiler does the following:
add rax, rdx
I can not understand how does this instruction make rax point to any character in argv[1].
Below is the C code and the assembler code corresponding to the loop's instructions:
#include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
int sum = 0;
for(int i = 0; i < strlen(argv[1]); i ++){
sum += (int)argv[1][i];
}
return 0;
}
Assembly
mov rax, QWORD PTR [rbp-48]
add rax, 8
mov rdx, QWORD PTR [rax]
mov eax, DWORD PTR [rbp-24]
cdqe
add rax, rdx
movzx eax, BYTE PTR [rax]
movsx eax, al
add DWORD PTR [rbp-20], eax
add DWORD PTR [rbp-24], 1
Upvotes: 0
Views: 4886
Reputation: 12435
Oh, I finally figured out your confusion. At the point of the instruction in question, rax no longer contains argv; it was reloaded with the value of i. The compiler is using an add
instruction instead of an indexed addressing mode.
eax is the lower 32 bits of rax. When eax is loaded, the value is zero-extended to 64 bits.
And then cdqe
sign-extends EAX into RAX, because i
is a signed 32-bit integer that you're using to index a pointer. The compiler could have simplified by loading with
movsx rax, dword ptr [rbp-24]
.
Upvotes: 2