Reputation: 601
I am playing around with program command line arguments. In particular I am trying to do some testing on the string argv[1]. If I use a two step method of getting the address to argv[1], my code runs fine.
mov ebx, [ebp+12]
mov eax, [ebx+4] ; address of argv[1]
If I use one step, my program prints jibberish.
mov eax, [ebp+16] ; address of argv[1]
Am I incorrect in assuming that either method would now refer to the address [ebp+16]? Am I missing something trivial?
Upvotes: 0
Views: 2451
Reputation: 44076
It's easy to get confused when working with pointers to pointers in assembly.
argv
is an "array of strings" or better an array of pointers to char, since in C arrays decay into pointers to their item type when passed as arguments, in truth argv
it is a pointer to pointer to char or char** argv
.
This tells us that we need two dereferencing to access the chars of any of the strings and one to access any pointer to any of such strings.
Assuming the cdecl convention where parameters are passed on the stack in reverse order, and assuming a standard prolog that sets a standard frame-pointer we have that the value of argc
is at ebp+0ch
.
Note that ebp
has the semantic of a pointer and so ebp+0ch
is just pointer arithmetic to get another pointer, this time to the argc
value.
If we were willing to give ebp+0ch
a C type it would be char***
, hence two dereferencing are needed to access the pointer argv[1]
.
The code to get argv[1]
into ESI
is:
;typeof(ebp+0ch) = char***
mov esi, DWORD [ebp+0ch] ;1st defer, esi = argv, typeof(esi) = char**
mov esi, DWORD [esi+04h] ;2nd defer, esi = argv[1], typeof(esi) = char*
;Optional, Get a char
mov al, BYTE [esi] ;3rd defer, al = argv[1][0], typeof(al) = char
The types check.
Sounds confusing?
Let's draw those pointers!
The stack The memory
100ch | 2000h | argv 2000h | 2008h | argv[0]
1008h | 2 | argc 2004h | 2010h | argv[1]
1004h | yyyyyy | return address 2008h | file | argv[0][0..3]
1000h | xxxxxx | old frame pointer 200ch | .a\0\0 | argv[0][4..7]
2010h | -arg | argv[1][0..3]
EBP = 1000h 2014h | 1\0\0\0 | argv[1][4..7]
ebp+0ch
is 1000h + 0ch = 100ch and it is the address of the argv
value.
mov esi, DWORD [ebp+0ch]
is like mov esi, DWORD [100ch]
and it sets ESI
to 2000h.
2000h is the value of argv
which is an array so it is the address of argv[0]
.
The address of argv[1]
is four bytes ahead, thus 2000h+04h = 2004h.
mov esi, DWORD [esi+04h]
is like mov esi, DWORD [2004h]
and it sets ESI
to 2010h.
2010h is the address of the string "-arg1".
Note that the picture above is not C nor C++ standard compliant as argv[argc]
must be 0.
I left that out of the picture.
Upvotes: 4
Reputation: 1243
This is the answer to your question.
mov eax, [ebp+16]
lea ebx, [ebp+12]
mov eax, [ebx+4]
or
mov eax, [ebp+16]
mov ebx, ebp
add ebx, 12
mov eax, [ebx+4]
The former saves a few bytes of code, but they are functionally equivalent.
Upvotes: -2