Lakshman Siddardha
Lakshman Siddardha

Reputation: 19

which mode in intel x86-64 is faster to execute instructions

Intel has - real mode - protected mode - virtual real mode - 64-bit mode

Out of these modes, which one execute the same sets of instructions faster?

using prefixes one can change the addresses & sizes that can similar to other modes.

Upvotes: 1

Views: 1059

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 363980

TL:DR: Tell your compiler to make 64-bit executables to get max performance most of the time. But it can be worth benchmarking against a 32-bit build, especially if your code uses a lot of pointer-heavy data structures.

In theory, faster 64-bit code is almost always possible (and a few legacy realities like not assuming SSE2 for 32-bit, and the 32-bit legacy calling conventions, also favour 64-bit in practice), but sometimes having your program be faster in 64-bit mode would involve something like an ILP32 ABI such as Linux x32, or maybe using int_least32_t instead of long when you want a type that's at least 32-bit.


Intel (and AMD) CPUs don't have any inherent penalties that make decoding or execution less efficient in any mode1.

But some choices of operand-size are worse than others (e.g. 16-bit sucks because of partial-register false dependencies or stalls), and 16-bit code needs prefixes to use 32-bit operand-size and address-size. Intel CPUs don't have a problem decoding lots of prefixes, but larger code-size in general is a bad thing, reducing code density in L1I cache and sometimes in the uop cache.

Footnote 1: except if you're using using 32-bit address-size in 16-bit mode, e.g. "big unreal mode", then Intel P6-family CPUs (i.e. before Sandybridge) will have LCP stalls on every such instruction with a 32-bit ModRM addressing mode in 16-bit mode even it's not actually length-changing, i.e. a false LCP stall. Address-size prefixes aren't useful in normal 32-bit mode (except as padding) so this problem is basically not relevant for 32-bit code.


64-bit code has larger instructions (because 64-bit operand size needs a REX prefix). Usually this doesn't matter, because the uop cache and L1I cache usually completely hide the effect of code-size on performance. 32 and 64-bit operand-size are both the same speed for most instructions, and 64-bit code can still use 32-bit operand-size except when it really needs wide types, to avoid the extra cost of 64-bit integer division (and the REX prefixes).

The scenario is that, i want to write a general program. I want to know which mode would be faster and why?

This is a different question than what you asked.

Long mode is usually fastest because it usually takes fewer instructions to get the same work done, because of better calling conventions and more registers (fewer spills). Especially if you have any FP computation, or SIMD-friendly loops, 64-bit mode can be a big win because FP code can often take advantage of more registers.

But pointer-heavy data structures in 64-bit code have twice the cache footprint of 32-bit code (which can run in protected/compat mode). Also, having a 64-bit alignment requirement can result in more struct padding, so a pointer + int struct will be 16 bytes, not 12 bytes, in 64-bit code.

So you can get more cache misses in 64-bit code, and this can make it slower than 32-bit. Linux's x32 ABI tries to get the best of both worlds (for code that doesn't need a lot of virtual address space): 32-bit pointers in long mode.

Just storing 32-bit array indices instead of pointers can work, if all the "pointers" are into the same pool that you allocate from. But beware that it can result in worse load/use latency because you (or the compiler) needs an indexed addressing mode, or a separate add instruction.

There are tricks that JVMs (for example) use to "compact" pointers in 64-bit mode. https://wiki.openjdk.java.net/display/HotSpot/CompressedOops - some kinds of pointers are stored as 32-bit that much be left-shifted by 3 for use, because they point to 8-byte-aligned heap objects. This allows addressing 32GiB of space.

Upvotes: 2

Related Questions