Reputation: 1055
Some people say that assembly language = machine language, just that we use mnemonics in assembly language.
After reading Petzold's "CODE", I can't still understand how some of the assembly codes are translated into machine code.
For example (from Tutorials Point's Assembly Course):
_start: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
section .data
msg db 'Hello, world!', 0xa ;our dear string
What I understand is that msg contains "Hello, world!" and it's moved into ECX.
But as I know, in x86 the ECX can just store 32 bits.
Then how can we move "Hello, world!" - which is more than 32 bits - into ECX?
And what is the equivalence of that part
section .data
msg db 'Hello, world!', 0xa ;our dear string
in machine code?
Upvotes: 4
Views: 449
Reputation: 363970
For x86, Intel's insn reference manual lists all the encodings for every instruction (see the links in https://stackoverflow.com/tags/x86/info).
mov ecx, msg
is the mov r32, imm32
encoding. The address of msg
is eventually filled in into those 4 bytes in the instruction at link time, because that's when the final absolute address is determined.
mov ecx, [msg]
would be a 4-byte load from an absolute address (the start of msg). It would be encoded as mov r32, r/m32
, using a memory-operand encoding for the source.
len
is probably defined with and equ
assembler directive. So it's a symbol, but its value isn't an address. Instead, its value is a number in the asm source file. msg
is a symbol, and also a label, whose value is an address.
Upvotes: 0
Reputation: 28818
The syntax depends on the assembler, for MASM or ML (Microsoft's version of MASM), the syntax would be
mov ecx,offset msg ;ecx = offset (address) of msg
which makes it clear that the offset or address of msg is being loaded into ecx, as opposed to the first 4 bytes of msg.
Upvotes: 1
Reputation: 14434
Your question is a good one. It gets at the fundamental computer concept of indirection.
The normal way for a computer to treat a string of text like "Hello, world!" is to keep it in memory as a series of characters. For example:
Memory address Memory contents
8201 'H'
8202 'e'
8203 'l'
8204 'l'
8205 'o'
8206 ','
8207 ' '
... ...
820E 0
The value of msg
in this example is 0x8201. It is not 'H'
. Therefore, the value 0x8201 is moved to register ecx
.
Later, anyone who wants the message can read the 0x8201 out of ecx
, then go to memory address 0x8201 to find the start of the actual text message. Does this make sense?
Upvotes: 1
Reputation: 18825
With msg db
you define address containing the string sequence of bytes. With mov ecx, msg
you load just this address not its content. Then it's possible to load string by loading [ecx], [ecx+1] etc.
.data defines program section. .text usually contains machine code, .data modifiable program code. There can be more of them such as exception handling labels etc.
Upvotes: 2
Reputation: 17595
To my understanding, the command
mov ecx,msg
does not actually move the whole string Hello, world!
to the register, but rather a pointer to its beginning. The assembler directive
msg db 'Hello, world!', 0xa
apparently defines a memory location which contains the actual string Hello, world!
and can be referenced by the label msg
. However, actual usage is a bit hard to tell, as further usage of the contents of the register is missing.
Upvotes: 0