Yuval Adam
Yuval Adam

Reputation: 165242

Dereferencing a label in x86 assembly

Consider this x86 assembly code:

section .data

foo:
    mov ebx, [boo]
    mov [goo], ebx
goo:
    mov eax, 2
    mov eax, 3
    ret
boo:
    mov eax, 4
    mov eax, 5
    ret

What exactly is going on here? When I dereference [boo] and mov it to [goo] what exactly am I moving there? Just one command? The ret as well?


Follow-up questions:

  1. Does dereferencing a label give me an address? Or the machine code for the first command in the label?
  2. If it's a machine code - how can it possibly be more than one command? Aren't all commands essentially 32-bit (even if not all bits are used)?
  3. Bottom line - will eax have a value of 3 or 5 at the end?

Upvotes: 7

Views: 10478

Answers (3)

Ruben Bartelink
Ruben Bartelink

Reputation: 61795

The first mov is copying from the offset goo relative to the segment register [e]DS. The second mov is writing at the offset of foo into a data location relative to the DS register. If the CS and DS are coincidental, then this can be ignored. Assuming the CS and DS are coincidental, you're next likely to run into various protection mechanisms that render code sections read-only.

RE followups:

  1. A label isnt like a reference - you dont dereference as such. The assembler substitutes in a number representing the location in the resulting code. You can load either the address, or the thing at the address. The [ and ] indicate dereferencing - I've fixed a confusing element in my first response to cover this. IOW doing [goo] loads the thing at that address.
  2. A CISC instruction set like x86 has [very] variable length instructions - some even not a multiple of the word length. RISC ones generally try to rstict this to make decoding instructions simpler.
  3. 3 - you are only modifing the first 4 bytes of the mov eax, 2 (which, due to the little endian encoding does get replaced with 4 but then gets overwritten by the next instruction which hasnt been modified at all - 5 is never in the picture as a candidate (I thought you were thinking the code gets reordered the way you first asked the question[1] though you clearly know quite a bit more as I should have guessed from your rep :P)]).

Note that all of this assumes that CS = DS and DEP isnt stepping in.

Also, if you were using BX instead of EBX, the sort of things you were expecting will come into play (using xX instead of ExX accesses the low 2 bytes of the register [and xL accesses the lowest byte])

[1] Remember that an assembler is purely a tool for writing opcodes - stuff like labels etc. all get boiled down to numbers etc. with very little magic or impressive transformations of the code - there's no closures or anything deep of that nature lurking in there. (This is slightly oversimplifying - code can be relocatable, and in many cases fixups get applied to usages of offsets by a combination of the linker and the loader)

Upvotes: 3

Gunther Piez
Gunther Piez

Reputation: 30439

Follow up answers:

  1. It gives you the machine code starting at the address. How much of that depends of the length of your load, in this case it is 4 byte.

  2. It can be more than one command or only a fragment of a command. On this architecture (Intel x86) machine code commands are between 8 and 120 Bit.

  3. 3.

Upvotes: 2

Bastien Léonard
Bastien Léonard

Reputation: 61713

boo is the offset of the instruction mov eax, 3 inside section .data. mov ebx, [boo] means “fetch four bytes at the offset indicated by boo inside ebx”. Likewise, mov [goo], ebx would move the content of ebx at the offset indicated by goo.

However, code is often read-only, so it wouldn't be surprising to see the code just crashing.

Here is how the instructions at boo are encoded:

boo:
b8 03 00 00 00          mov    eax,0x3
c3                      ret

So what you get in ebx is actually 4/5 of the mov eax, 3 instruction.

Upvotes: 10

Related Questions