Coder88
Coder88

Reputation: 1055

I don't understand these assembly code and machine code differences if assembly code instructions are equivalent of machine code instructions

Some people say that assembly language = machine language, just that we use mnemonics in assembly language.

After reading Petzold's "CODE", I can't still understand how some of the assembly codes are translated into machine code.

For example (from Tutorials Point's Assembly Course):

_start:             ;tells linker entry point
   mov  edx,len     ;message length
   mov  ecx,msg     ;message to write

section .data
msg db 'Hello, world!', 0xa  ;our dear string

What I understand is that msg contains "Hello, world!" and it's moved into ECX.

But as I know, in x86 the ECX can just store 32 bits.

Then how can we move "Hello, world!" - which is more than 32 bits - into ECX?

And what is the equivalence of that part

section .data
msg db 'Hello, world!', 0xa  ;our dear string

in machine code?

Upvotes: 4

Views: 449

Answers (5)

Peter Cordes
Peter Cordes

Reputation: 363970

For x86, Intel's insn reference manual lists all the encodings for every instruction (see the links in https://stackoverflow.com/tags/x86/info).

mov ecx, msg is the mov r32, imm32 encoding. The address of msg is eventually filled in into those 4 bytes in the instruction at link time, because that's when the final absolute address is determined.

mov ecx, [msg] would be a 4-byte load from an absolute address (the start of msg). It would be encoded as mov r32, r/m32, using a memory-operand encoding for the source.

len is probably defined with and equ assembler directive. So it's a symbol, but its value isn't an address. Instead, its value is a number in the asm source file. msg is a symbol, and also a label, whose value is an address.

Upvotes: 0

rcgldr
rcgldr

Reputation: 28818

The syntax depends on the assembler, for MASM or ML (Microsoft's version of MASM), the syntax would be

        mov     ecx,offset msg    ;ecx = offset (address) of msg

which makes it clear that the offset or address of msg is being loaded into ecx, as opposed to the first 4 bytes of msg.

Upvotes: 1

thb
thb

Reputation: 14434

Your question is a good one. It gets at the fundamental computer concept of indirection.

The normal way for a computer to treat a string of text like "Hello, world!" is to keep it in memory as a series of characters. For example:

Memory address    Memory contents
8201              'H'
8202              'e'
8203              'l'
8204              'l'
8205              'o'
8206              ','
8207              ' '
...               ...
820E              0

The value of msg in this example is 0x8201. It is not 'H'. Therefore, the value 0x8201 is moved to register ecx.

Later, anyone who wants the message can read the 0x8201 out of ecx, then go to memory address 0x8201 to find the start of the actual text message. Does this make sense?

Upvotes: 1

Zbynek Vyskovsky - kvr000
Zbynek Vyskovsky - kvr000

Reputation: 18825

With msg db you define address containing the string sequence of bytes. With mov ecx, msg you load just this address not its content. Then it's possible to load string by loading [ecx], [ecx+1] etc.

.data defines program section. .text usually contains machine code, .data modifiable program code. There can be more of them such as exception handling labels etc.

Upvotes: 2

Codor
Codor

Reputation: 17595

To my understanding, the command

mov ecx,msg

does not actually move the whole string Hello, world! to the register, but rather a pointer to its beginning. The assembler directive

msg db 'Hello, world!', 0xa

apparently defines a memory location which contains the actual string Hello, world! and can be referenced by the label msg. However, actual usage is a bit hard to tell, as further usage of the contents of the register is missing.

Upvotes: 0

Related Questions