MrDrMcCoy
MrDrMcCoy

Reputation: 366

Scan binary for CPU feature usage

I am debugging an application that runs properly on an Intel CPU, but not on another, newer AMD processor. I suspect that it may have been compiled to use certain Intel-specific instructions, which leads to the crashes. However, I am looking for a way to verify this. I do not have access to the original source code.

Is there a tool that can scan a binary and list which CPU-specific features it may use?

Upvotes: 3

Views: 1082

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364190

There are two good approaches:

  • Run under a debugger and look at instruction that caused an illegal-instruction fault
  • Run under a simulator/emulator that can show you an instruction mix, like SDE.

But your idea, statically scanning the binary, can't distinguish code in functions that are only called after checking cpuid.


Using a debugger to look at the faulting instruction

Pick any debugger. GDB is easy to install on any Linux distro, and probably also on Windows or Mac (or lldb there). Or pick any other debugger, e.g. one with a GUID.

Run the program. Once it faults, use the debugger to examine the faulting instruction.

Look it up in Intel or AMD's x86 asm reference manual, e.g. https://www.felixcloutier.com/x86/ is an HTML scrape of Intel's PDFs. See which ISA extension this form of this instruction requires.

For example, this source can compile to use AVX-512 instructions if you let the compiler do so, but only needs SSE2 to compile in the first place.

#include <immintrin.h>
// stores to global vars typically aren't optimized out, even without volatile
int buf[16];
int main(int argc, char **argv)
{
        __m128i v = _mm_set1_epi32(argc);     // broadcast scalar to vector
        _mm_storeu_si128((__m128i*)buf, v);
}

(See it on Godbolt with different compile options.)

Build with gcc -march=skylake-avx512 -O3 ill.c.
Then try to run it, e.g. on my Skylake-client (non-AVX512) GNU/Linux desktop. (I also used strip a.out to remove the symbol table (function names), like a binary-only software release).

$ ./a.out 
Illegal instruction (core dumped)
$ gdb a.out
...
(gdb) run
Starting program: /tmp/a.out 

Program received signal SIGILL, Illegal instruction.
0x0000555555555020 in ?? ()

(gdb) disas
No function contains program counter for selected frame.

(gdb) disas /r $pc,+20            # from current program counter to +20 bytes
Dump of assembler code from 0x555555555020 to 0x555555555034:
=> 0x0000555555555020:  62 f2 7d 08 7c c7       vpbroadcastd xmm0,edi
   0x0000555555555026:  c5 f9 7f 05 32 30 00 00 vmovdqa XMMWORD PTR [rip+0x3032],xmm0        # 0x555555558060
   0x000055555555502e:  31 c0   xor    eax,eax
   0x0000555555555030:  c3      ret    
   0x0000555555555031:  66 2e 0f 1f 84 00 00 00 00 00   cs nop WORD PTR [rax+rax*1+0x0]
End of assembler dump.

The => indicates the current program counter (RIP in x86-64, but GDB portably defines $pc as an alias on any ISA.)

So we faulted on vpbroadcastd xmm0,edi. (The way GCC implemented _mm_set1_epi32(argc) when we told it AVX512 was available.)

That doesn't involve memory access, and the fault was illegal-instruction not segmentation-fault anyway, so we can be sure that actually trying to execute an unsupported instruction was the direct cause of the crash here. (It's also possible for it to be an indirect cause, e.g. a program using lzcnt eax, ecx but an old CPU running it as bsr eax, ecx, and then using that different integer as an array index. lzcnt/bsr is somewhat unlikely for your case since AMD has supported it for longer than Intel.)

So let's check on vpbroadcastd: there are multiple entries for vpbroadcast in Intel's manual:

If the mnemonic starts with v and you can't find an entry, e.g. vaddps, that's because the instruction existed before AVX, and is documented under its legacy-SSE mnemonic, like SSE1 addps which does list both addps and vaddps encodings, including the AVX-512 encodings that allow ZMM registers, x/ymm16..31, and masking like vaddps ymm0{k3}{z}, ymm1, ymm2. That's an AVX-512F+VL instruction.

Anyway, back to our example. The table entry that matches the faulting instruction was the following. Note the 7C opcode byte before the ModR/M (/r) that encodes the operands. That's present after the 4-byte EVEX prefix, as a cross-check that this is indeed the opcode we're looking for.

EVEX.128.66.0F38.W0 7C /r VPBROADCASTD xmm1 {k1}{z}, r32

It requires "AVX512VL AVX512F" according to the table. The {k1}{z} is optional masking. r32 is a 32-bit general-purpose integer register, like edi in this case. xmm1 means any XMM register can be the first xmm operand to this instruction; in this case GCC chose XMM0.

My CPU doesn't have AVX-512 at all, so it faulted.


SDE instruction mix

This should work equally well on Windows or any other OS.

Intel's SDE (Software Development Emulator) has a -mix option, whose output includes categorizing by required ISA extension. See How do I monitor the amount of SIMD instruction usage re: using it.

Using the same example a.out I used with GDB:

Running /opt/sde-external-8.33.0-2019-02-07-lin/sde64 -mix -- ./a.out created a file sde-mix-out.txt which contained a lot of stuff, including stats for how often different basic blocks were executed. (Some in the dynamic linker ran many times.) IDK if there's an option to omit that, because it would get pretty bloated for a large program, I expect. I think it might only print the top few blocks, even if there are many more.

Then we get to the part we want:

...
# END_TOP_BLOCK_STATS
# EMIT_DYNAMIC_STATS FOR TID 0  OS-TID 1168465 EMIT #1
#
# $dynamic-counts
#
# TID 0
#       opcode                 count
#
*stack-read                                                         8806
*stack-write                                                        8314
*iprel-read                                                         1003
*iprel-write                                                         437

...

*isa-ext-AVX                                                           4
*isa-ext-AVX2                                                          5
*isa-ext-AVX512EVEX                                                    1
*isa-ext-BASE                                                     133338
*isa-ext-LONGMODE                                                    545
*isa-ext-SSE                                                          56
*isa-ext-SSE2                                                       2560
*isa-ext-XSAVE                                                         1
*isa-set-AVX                                                           4
*isa-set-AVX2                                                          5
*isa-set-AVX512F_128                                                   1
*isa-set-CMOV                                                        266
*isa-set-FAT_NOP                                                     891
*isa-set-I186                                                       2676
*isa-set-I386                                                       7626
*isa-set-I486REAL                                                     71
*isa-set-I86                                                      121192
*isa-set-LONGMODE                                                    545
*isa-set-PENTIUMREAL                                                   8
*isa-set-PPRO                                                        608
*isa-set-SSE                                                          56
*isa-set-SSE2                                                       2560
*isa-set-XSAVE                                                         1

The 1 count for isa-set-AVX512F_128 is the instruction that would have faulted on my CPU, which doesn't support AVX-512 at all. AVX512F_128 is AVX512F (foundation) + AVX512VL (vector length, allowing vectors other than 512-bit ZMM registers).

(It was also counted as isa-ext-AVX512EVEX. EVEX is the machine-code prefix for AVX-512 vector instructions. AVX-512 mask instructions like kandw k0, k1, k2 use VEX encoding, like AVX1/AVX2 SIMD instructions. But this wouldn't distinguish an Ice Lake new instruction like vpermb faulting on a Skylake-server CPU that supports AVX-512F but not AVX512VBMI)

Everything other than AVX-512 is probably simpler, since there's a fully separate name for each extension.


Static disassembly

You can disassemble most binaries; if they're not obfuscated then disassembly should find all the instructions that might ever execute. (And high-performance code that uses new instructions is unlikely to be using hacks that throw off a disassembler, like jumping into the middle of what straight-line disassembly would see as a different instruction; x86 machine code is a byte-stream of variable-length instructions.)

But that doesn't tell you which instructions actually do execute; some might be in functions that are only called after checking CPUID to find out if the necessary extensions are supported.

(And I don't know of a tool to categorize them by ISA extension, although I've never looked for one; usually developers wanting to make sure they didn't use AVX2 instructions in code that will run on AVX1-only CPUs use build-time checks, or test by running under an emulator or on a real CPU.)

Upvotes: 1

Related Questions