Eric
Eric

Reputation: 601

NASM Memory Addressing

I am playing around with program command line arguments. In particular I am trying to do some testing on the string argv[1]. If I use a two step method of getting the address to argv[1], my code runs fine.

mov ebx, [ebp+12]
mov eax, [ebx+4] ; address of argv[1]

If I use one step, my program prints jibberish.

mov eax, [ebp+16] ; address of argv[1]

Am I incorrect in assuming that either method would now refer to the address [ebp+16]? Am I missing something trivial?

Upvotes: 0

Views: 2451

Answers (2)

Margaret Bloom
Margaret Bloom

Reputation: 44076

It's easy to get confused when working with pointers to pointers in assembly.

argv is an "array of strings" or better an array of pointers to char, since in C arrays decay into pointers to their item type when passed as arguments, in truth argv it is a pointer to pointer to char or char** argv.

This tells us that we need two dereferencing to access the chars of any of the strings and one to access any pointer to any of such strings.

Assuming the cdecl convention where parameters are passed on the stack in reverse order, and assuming a standard prolog that sets a standard frame-pointer we have that the value of argc is at ebp+0ch.
Note that ebp has the semantic of a pointer and so ebp+0ch is just pointer arithmetic to get another pointer, this time to the argc value.

If we were willing to give ebp+0ch a C type it would be char***, hence two dereferencing are needed to access the pointer argv[1].

The code to get argv[1] into ESI is:

;typeof(ebp+0ch) = char***

mov esi, DWORD [ebp+0ch]      ;1st defer, esi = argv, typeof(esi) = char**
mov esi, DWORD [esi+04h]      ;2nd defer, esi = argv[1], typeof(esi) = char*

;Optional, Get a char
mov al, BYTE [esi]            ;3rd defer, al = argv[1][0], typeof(al) = char

The types check.


Sounds confusing?
Let's draw those pointers!

       The stack                                     The memory

100ch | 2000h  | argv                         2000h | 2008h   | argv[0]
1008h | 2      | argc                         2004h | 2010h   | argv[1]
1004h | yyyyyy | return address               2008h | file    | argv[0][0..3]
1000h | xxxxxx | old frame pointer            200ch | .a\0\0  | argv[0][4..7]
                                              2010h | -arg    | argv[1][0..3]
EBP = 1000h                                   2014h | 1\0\0\0 | argv[1][4..7]

ebp+0ch is 1000h + 0ch = 100ch and it is the address of the argv value.
mov esi, DWORD [ebp+0ch] is like mov esi, DWORD [100ch] and it sets ESI to 2000h.
2000h is the value of argv which is an array so it is the address of argv[0].

The address of argv[1] is four bytes ahead, thus 2000h+04h = 2004h.
mov esi, DWORD [esi+04h] is like mov esi, DWORD [2004h] and it sets ESI to 2010h.
2010h is the address of the string "-arg1".


Note that the picture above is not C nor C++ standard compliant as argv[argc] must be 0.
I left that out of the picture.

Upvotes: 4

Shift_Left
Shift_Left

Reputation: 1243

This is the answer to your question.

mov eax, [ebp+16] 
lea ebx, [ebp+12] 
mov eax, [ebx+4]

or

mov eax, [ebp+16]
mov ebx, ebp
add ebx, 12
mov eax, [ebx+4]

The former saves a few bytes of code, but they are functionally equivalent.

Upvotes: -2

Related Questions