Reputation: 1008
I am trying to wrap my mind around pointers in Assembly.
What exactly is the difference between:
mov eax, ebx
and
mov [eax], ebx
and when should dword ptr [eax]
should be used?
Also when I try to do mov eax, [ebx]
I get a compile error, why is this?
Upvotes: 36
Views: 74841
Reputation: 3004
Previous answer is great and detailed. However, there is another aspect to mention:
EAX
is a 32 bit register that can hold a 32 bit number, which is a double word. (lower half is AX, 16 bits or a word, and quarter are AH/AL, 8 bits or a byte)
However, [EAX]
is a 32 bit memory address and memory locations are composed of consecutive 8 bit units. You need to specify what kind of data is hold at that location, meaning you have to give total length of the block: 8 bit BYTE
, 16 bit WORD
, or in this case 32 bit DOUBLE WORD
.
mov eax,12345678h ; 32bit number
mov [eax],eax ; copy the number 12345678h to (same as DOUBLE WORD)
; the 12345678h-1234567Bh memory locations (4 bytes)
mov word[eax],ax ; copy the number 5678h to
; the 12345678h-12345679h memory locations (2 bytes)
mov byte[eax],ah ; copy the number 78h to
; the 12345678h memory location (1 byte)
note: 64 bit, or 8 bytes, the quad word, should be similar, but I haven't worked with that yet, so no comment about it.
EDIT: as per the comment below, when using registers the sizes are detected fine by (most) compilers, but if you use immediate values you need to be explicit about them.
Upvotes: 0
Reputation: 244692
As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.
So, this:
mov eax, ebx
simply copies the value in ebx
into eax
. In a pseudo-C notation, this would be: eax = ebx
.
Whereas this:
mov eax, [ebx]
dereferences the contents of ebx
and stores the pointed-to value in eax
. In a pseudo-C notation, this would be: eax = *ebx
.
Finally, this:
mov [eax], ebx
stores the value in ebx
into the memory location pointed to by eax
. Again, in pseudo-C notation: *eax = ebx
.
The registers here could also be replaced with memory operands, such as symbolic variable names. So this:
mov eax, [myVar]
dereferences the address of the variable myVar
and stores the contents of that variable in eax
, like eax = myVar
.
By contrast, this:
mov eax, myVar
stores the address of the variable myVar
into eax
, like eax = &myVar
.
At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.
To get the address of a variable in MASM, you would use the OFFSET
keyword:
mov eax, OFFSET myVar
However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.
Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar
is an int
, you would do:
mov eax, DWORD PTR [myVar] ; eax = myVar
or
mov DWORD PTR [myVar], eax ; myVar = eax
This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar
is a DWORD
-sized memory location.
You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al
and ah
are always BYTE
-sized, ax
is always WORD
-sized, eax
is always DWORD
-sized, and rax
is always QWORD
-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.
Also when I try to do
mov eax, [ebx]
I get a compile error, why is this?
Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:
mov eax, DWORD PTR [ebx]
and means that the memory location pointed to by ebx
will be dereferenced and that DWORD
-sized value will be loaded into eax
.
why I cant do
mov a, [eax]
Should that not make "a" a pointer to wherever eax is pointing?
No. This combination of operands is not allowed. As you can see from the documentation for the MOV
instruction, there are essentially five possibilities (ignoring alternate encodings and segments):
mov register, register ; copy one register to another
mov register, memory ; load value from memory into register
mov memory, register ; store value from register into memory
mov register, immediate ; move immediate value (constant) into register
mov memory, immediate ; store immediate value (constant) in memory
Notice that there is no mov memory, memory
, which is what you were trying.
However, you can make a
point to what eax
is pointing to by simply coding:
mov DWORD PTR [a], eax
Now a
and eax
have the same value. If eax
was a pointer, then a
is now a pointer to that same memory location.
If you want to set a
to the value that eax
is pointing to, then you will need to do:
mov eax, DWORD PTR [eax] ; eax = *eax
mov DWORD PTR [a], eax ; a = eax
Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:
mov edx, DWORD PTR [eax] ; edx = *eax
mov DWORD PTR [a], edx ; a = edx
I realize this is all somewhat confusing. The mov
instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov
instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.
Upvotes: 63