awpsoleet
awpsoleet

Reputation: 257

X64 instructions that behave differently on different CPUs

During an interview I was asked if I knew x64 instructions that behave differently depending on the CPU used, I couldn't find any documentation on that anywhere, does anyone know what these instructions are and why this is the case?

Upvotes: 4

Views: 919

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364200

There are some that leave a register or some flags with undefined values. Intel and AMD may differ there.

In some cases, the actual behaviour of real hardware for these undefined cases preserves backwards compatibility for some old software that relies on it. For example, BSF with input=0 sets ZF and leaves the destination register unmodified. (On both current Intel and AMD hardware. IDK if any old Intel hardware was ever different, if no, bsf/bsr isn't really an example of an instruction that executes differently, just a lack of documented guarantees of being future-proof.)

But the difference is that Intel documents it as leaving the destination register with "undefined" contents. AMD's manuals explicitly document and guarantee that AMD CPUs will leave the destination unmodified in that case.

AMD's AMD64 manual (March 2017) for bsr/bsf:
If the second operand contains 0, the instruction sets ZF to 1 and does not change the contents of the destination register

So it's not guaranteed on paper that it's safe to emulate tzcnt / implement std::countr_zero as mov eax, 32 / bsf eax, edx, even though that works in practice and will likely continue working on future CPUs. (This is why bsf / bsr have an output dependency.) Intel might eventually document this behaviour, in which case compilers will be able to use it for a more efficient countr_zero / countl_zero without BMI1. Intel did recently document that AVX implied 16-byte aligned loads / stores were atomic on Intel CPUs, so it's not unprecedented for a vendor to document something that their CPUs have been doing for years.

If performance differences count, there are many (see links in the tag wiki)!


You're not just talking about unsupported instructions, are you? Like LAHF/SAHF being unsupported in long mode on some very early x86-64 CPUs? Or CMPXCHG16B also unsupported on early K8.

The most interesting case of unsupported instructions is that LZCNT decodes as BSR on CPUs that don't support it, the REP prefix being ignored. Even for non-zero inputs, they return opposite results. (_lzcnt_u32(x) == 31-bsr(x)). TZCNT similarly decodes as (REP) BSF on CPUs that don't support it, but they do the same thing except when input = 0. I didn't mention this earlier, because running the same machine-code differently is not the same thing as running the same instruction differently, but it sounds like this is the kind of thing you're asking for.

Are we talking only about un-privileged instructions? There are probably many more differences in the behaviour of privileged instructions. For example, Intel and AMD both have different bugs in SYSRET that Linux has to work around to avoid malicious user-space being able to cause a kernel hang.


Another case that I'm not sure counts: PREFETCHW runs on Intel CPUs from at least Core2 to Haswell as a NOP, but on AMD CPUs (and Intel since Broadwell) as an actual prefetch.

So some CPUs run it as a NOP, some run it as a prefetch (thus no architecturally visible effect either way), except on ancient CPUs where it faults as an illegal insn. 64-bit Windows8.1 apparently requires that PREFETCHW can run without faulting (which stops it from running on (some?) 64-bit Pentium4 CPUs).

Upvotes: 6

phuclv
phuclv

Reputation: 41794

There are a lot of differences between Intel and AMD

  • Intel 64's BSF and BSR instructions act differently than AMD64's when the source is zero and the operand size is 32 bits. The processor sets the zero flag and leaves the upper 32 bits of the destination undefined.
  • Intel 64 lacks some MSRs that are considered architectural in AMD64. These include SYSCFG, TOP_MEM, and TOP_MEM2.
  • Intel 64 allows SYSCALL/SYSRET only in 64-bit mode (not in compatibility mode),[33] and allows SYSENTER/SYSEXIT in both modes.[34] AMD64 lacks SYSENTER/SYSEXIT in both sub-modes of long mode.[35]
  • In 64-bit mode, near branches with the 66H (operand size override) prefix behave differently. Intel 64 ignores this prefix: the instruction has 32-bit sign extended offset, and instruction pointer is not truncated. AMD64 uses 16-bit offset field in the instruction, and clears the top 48 bits of instruction pointer.
  • AMD processors raise a floating point Invalid Exception when performing an FLD or FSTP of an 80-bit signalling NaN, while Intel processors do not.
  • When returning to a non-canonical address using SYSRET, AMD64 processors execute the general protection fault handler in privilege level 3, while on Intel 64 processors it is executed in privilege level 0.[38][39]

https://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64

Upvotes: 4

Related Questions