Reputation: 39
i build a simple program and use file command to check program is 32-bit format. in turn, i use objdump to disassemble program and found some assembly instruction length larger than 4-byte.
i expect the program is 32-bit format. therefore, those assembly instruction length should not bigger than 4-byte. obviously, i am wrong. could you please tell me why it has 6-byte or 7-byte assembly instruction? thanks.
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=09aa196a671a6e169f09984360133ad9488f7e53, not stripped
$ objdump -d a.out
a.out: file format elf32-i386
Disassembly of section .init:
080482a8 <_init>:
80482a8: 53 push %ebx
80482a9: 83 ec 08 sub $0x8,%esp
80482ac: e8 8f 00 00 00 call 8048340 <__x86.get_pc_thunk.bx>
80482b1: 81 c3 4f 1d 00 00 add $0x1d4f,%ebx
80482b7: 8b 83 fc ff ff ff mov -0x4(%ebx),%eax
80482bd: 85 c0 test %eax,%eax
80482bf: 74 05 je 80482c6 <_init+0x1e>
80482c1: e8 3a 00 00 00 call 8048300 <__libc_start_main@plt+0x10>
80482c6: 83 c4 08 add $0x8,%esp
80482c9: 5b pop %ebx
80482ca: c3 ret
Upvotes: 1
Views: 1477
Reputation: 363999
Why? One obvious reason is so a single instruction can include a 32-bit immediate, like mov $address, %register
. And so a call rel32
can reach any 32-bit address from the current address.
These instructions need room for an opcode (1 byte) and sometimes a ModR/M byte to specify which register(s) / memory are operands.
If an instruction was limited to 4 bytes, it would take multiple instructions to put a static address into a register, and you couldn't use one as a memory-direct addressing mode. RISC ISAs typically need 2 instructions to construct arbitrary 32-bit constants (including addresses) in register, like MIPS lui $t0, high_half
/ ori $t0, $t0, low_half
x86 is variable-length CISC; common instructions are short, but longer instructions are possible instead of forcing you to construct an address or constant in a register with a separate instruction.
e.g. you can do movl $123456, some_static_variable
and get an instruction encoding with these components:
mov_opcode (1B) Mod/RM (1B) disp32 absolute address (4B) imm32=123456 (4B)
for a total of 10 bytes, including two 4-byte values. (In Intel's instruction-set reference manual (vol.2 of the x86 SDM), this is the mov r/m32, imm32
form of MOV, with a [disp32]
addressing mode.)
You could make it longer with prefixes, for example an fs:
segment override prefix for thread-local storage. And/or the addressing mode could include a scaled-index register, like movl $123456, array(,%ecx,4)
, so a SIB (scale/index/base) byte would be needed after the ModRM to encode the addressing mode.
Instead of mov
, we could have used add
, and then we could also have used a lock
prefix to make it an atomic read-modify write.
The hard limit on instruction length is 15 bytes. If decoding doesn't find the end of an instruction by then, a #UD
illegal instruction exception is raised. (A Linux kernel will deliver a SIGILL to the offending process.)
(Fun fact: original 8086 had no limit, and would happily keep looping trying to decode a whole 64k segment full of rep
prefixes)
Upvotes: 4
Reputation: 8962
The instruction length is not limited to 32 bits. From x86 Wikipedia page:
The x86 architecture is a variable instruction length, primarily "CISC" design.
and
Encoding Variable (1 to 15 bytes)
And from Intel® 64 and IA-32 architectures software developer’s manual:
The Intel386 processor sets a limit of 15 bytes on instruction length.
Upvotes: 3