Why am i adding rax and rdx?

Question

This is a program that gets passed a String as input.

I'm confused with the assembler code shown below, specifically line 6. This is what i understood from my research:

rbp-48 is a pointer that points to the stack address where argv is stored. (argv itself, is the address pointing to the start of the argv array)
Now rax register stores the argv array address.
We then add 8 bytes to rax. This means rax now points to the address of argv[1]. (I understand there is another address stored inside argv[1] that points to a string).
We then access the value stored in argv[1] and store it in the rdx register. This means, rdx now points to the address were the string begins.
We then move the [rbp-24] = i counter variable to the eax register.
We then have an action cdqe which I believe it's not relevant.

And now is were I get confused: If I wanted to access the first character in argv[1] and store it in eax register, I would expect assembler to do something like:

mov   eax, BYTE PTR [rdx]

And if I need to access the second character stored in argv[1] and store it in eax register, I would expect assembler to do something like:

mov   eax, BYTE PTR [rdx+1]

But instead, I see the compiler does the following:

add     rax, rdx

Adds the address in memory where the string begins to the address in memory were the address that points to the start of the string is stored, and saves this result in rax.

I can not understand how does this instruction make rax point to any character in argv[1].

Below is the C code and the assembler code corresponding to the loop's instructions:

#include 
#include 

int main(int argc, char *argv[]) {
int sum = 0;
for(int i = 0; i < strlen(argv[1]); i ++){
  sum += (int)argv[1][i];
}
return 0;
}

Assembly

mov     rax, QWORD PTR [rbp-48]
add     rax, 8
mov     rdx, QWORD PTR [rax]
mov     eax, DWORD PTR [rbp-24]
cdqe
add     rax, rdx
movzx   eax, BYTE PTR [rax]
movsx   eax, al
add     DWORD PTR [rbp-20], eax
add     DWORD PTR [rbp-24], 1

prl · Accepted Answer

Oh, I finally figured out your confusion. At the point of the instruction in question, rax no longer contains argv; it was reloaded with the value of i. The compiler is using an add instruction instead of an indexed addressing mode.

eax is the lower 32 bits of rax. When eax is loaded, the value is zero-extended to 64 bits.

And then cdqe sign-extends EAX into RAX, because i is a signed 32-bit integer that you're using to index a pointer. The compiler could have simplified by loading with
movsx rax, dword ptr [rbp-24].

Why am i adding rax and rdx?

Answers (1)

Related Questions