Reputation: 4767
I have stored a one-byte value of 8
and I'd like to move that into the rax
register. I'm currently doing this with movzx
to zero-extend the byte:
.globl main
main:
push %rbp
mov %rsp, %rbp
movb $8, -1(%rbp)
movzx -1(%rbp), %rax <-- here
...
How does the movzx
instruction 'know' that the value at -1(%rbp)
is only one byte long? From here is says, if I'm reading it properly, that it can work on both a byte
and a word
, but how would it know? For example, if I added a two-byte value at -2(%rbp)
how would it know to grab the two-byte value? Is there another instruction where I can just grab a one
or two
or four
byte value at an address and insert it into a 64 bit register?
I suppose another way to do it would be to first zero-out the register and then add it to the 8-bit (or however many bits) component, such as:
mov $0, %rax
mov -1(%rbp), %al
Is there one way that is more preferred than another way?
Upvotes: 0
Views: 1694
Reputation: 365586
It's ambiguous and relies on some default, you shouldn't write code like that.
That's why AT&T syntax has movzb
and movzw
instructions (typically used as movzbl -1(%rbp), %eax
), for the two different source sizes of the Intel-syntax movzx
mnemonic. See Are x86 Assembly Mnemonic standarized? (no, AT&T makes up new names.)
And yes, you could xor %eax,%eax
/ mov -1(%rbp), %al
to merge into the low byte, but that's pointlessly inefficient. x86-64 guarantees the availability of 386 instructions like movzx.
Surprisingly, movzx -1(%rbp), %rax
does assemble. If you assemble it, then disassemble back into AT&T syntax with objdump -d foo.o
, you get movzbq
(byte to quad), including a useless REX prefix instead of letting implicit zero-extension do the job after writing EAX.
48 0f b6 45 ff movzbq -0x1(%rbp),%rax
Or disassemble into Intel syntax with objdump -drwC -Mintel
:
48 0f b6 45 ff movzx rax,BYTE PTR [rbp-0x1]
Fun fact: GAS can't infer movzb
vs. movzw
if you write just movz
, because movz
isn't an instruction mnemonic. Unlike operand-size suffixes that can be inferred from the operands, the b
and w
are treated as part of the mnemonic. But you can write movzx
and then it will infer both sizes from register operands, just like in Intel-syntax mode.
5: 0f b6 c0 movzbl %al,%eax # source: movzx %al, %eax
8: 0f b7 c0 movzwl %ax,%eax # source: movzx %ax, %eax
movzw
and movzb
act like instruction mnemonics in their own right (that can infer a size suffix from the destination register). Semi-related: What does the MOVZBL instruction do in IA-32 AT&T syntax?
Also related: a table of cdq and so on equivalents in terms of movsx
and AT&T equivalents: What does cltq do in assembly?
Also related: MOVZX missing 32 bit register to 64 bit register - because that's implicit in writing a 32-bit register.
Upvotes: 4
Reputation: 18523
How does the
movzx
instruction 'know' that the value at-1(%rbp)
is only one byte long?
There are two (or even three) instructions:
movzxb
(-1(%rbp)
is one byte long) and movzxw
(-1(%rbp)
is one 16-bit word long).
My assembler interprets movzx
as movzxb
; however, you should not rely on that!
Better use the instruction name including the source size (movzxb
or movzxw
) to ensure that the assembler uses the correct instruction.
Upvotes: 2