y.selivonchyk
y.selivonchyk

Reputation: 9900

How does one get AVX512_FP16 flag support?

My CPU supports all sorts of things

  -march=CPU[,+EXTENSION...]
                          generate code for CPU and EXTENSION, CPU is one of:
                           generic32, generic64, i386, i486, i586, i686,
                           pentium, pentiumpro, pentiumii, pentiumiii, pentium4,
                           prescott, nocona, core, core2, corei7, l1om, k1om,
                           iamcu, k6, k6_2, athlon, opteron, k8, amdfam10,
                           bdver1, bdver2, bdver3, bdver4, znver1, btver1,
                           btver2
                          EXTENSION is combination of:
                           8087, 287, 387, 687, mmx, sse, sse2, sse3, ssse3,
                           sse4.1, sse4.2, sse4, avx, avx2, avx512f, avx512cd,
                           avx512er, avx512pf, avx512dq, avx512bw, avx512vl,
                           vmx, vmfunc, smx, xsave, xsaveopt, xsavec, xsaves,
                           aes, pclmul, fsgsbase, rdrnd, f16c, bmi2, fma, fma4,
                           xop, lwp, movbe, cx16, ept, lzcnt, hle, rtm, invpcid,
                           clflush, nop, syscall, rdtscp, 3dnow, 3dnowa,
                           padlock, svme, sse4a, abm, bmi, tbm, adx, rdseed,
                           prfchw, smap, mpx, sha, clflushopt, prefetchwt1, se1,
                           clwb, avx512ifma, avx512vbmi, avx512_4fmaps,
                           avx512_4vnniw, avx512_vpopcntdq, clzero, mwaitx,
                           ospke, rdpid, ptwrite, cet, no87, no287, no387,
                           no687, nommx, nosse, nosse2, nosse3, nossse3,
                           nosse4.1, nosse4.2, nosse4, noavx, noavx2, noavx512f,
                           noavx512cd, noavx512er, noavx512pf, noavx512dq,
                           noavx512bw, noavx512vl, noavx512ifma, noavx512vbmi,
                           noavx512_4fmaps, noavx512_4vnniw, noavx512_vpopcntdq

Yet, something as simple as __m256h inter; yields an error: '__m256h' was not declared in this scope. Which makes sense hense CPU requirement is a CPUID Flags: AVX512_FP16 + AVX512VL where AVX512_FP16 is not on the list.

How does one get AVX512_FP16 support? Is it CPU version dependent or can it be fixed with a patch?

Update: intel mentions that AVX512_FP16 is only supported alongside AVX512BW [check]. I am compiling using -march=skylake-avx512 which compiles regular __m512 but fails speficically on these FP16 based ops.

Upvotes: 0

Views: 3176

Answers (1)

FCLC
FCLC

Reputation: 81

Main Answer

Because AVX512FP16 is an extension to the AVX512 ISA, it must either:

A) Have explicit hardware support built in.

B) Be emulated in software by promoting the type to another suitable alternative such as fp32 with specific rounding/conformance code.

As of the time of your posting there were no systems in the market that had AVX 512 FP16 support available.

As of this posting (Feb 10 2022) the only in market support is the AVX512 P(erformance)-core workaround for Intel 12th generation K series AlderLake CPU's*.

These P-cores, based on the Golden Cove architecture, support AVX512FP16*.

To use the instruction in C or C++ a very recent compiler must be used. My own testing shows that GCC-12, Clang-14 and ICX 2022.0 are all capable of utilizing the instruction.

If you'd like to use an officially supported platform, the option is to wait for Intel Xeon Sapphire Rapids, which are based on only Golden Cove cores and will have the full AVX512 ISA enabled.

A snippet of code that will compile to utilize the FMA instructions from the AVX512FP16 ISA extension is at the end with instructions on it's usage.

Note on using Alder Lake

*NB: This capability can only be enabled once Gracemont E-cores are disabled, on specific vendors motherboards with specific BIOS/Microcode revisions. This is not Sanctioned or supported by Intel

The reason for this is mainly to do with different ISA's between the Gracemont and Golden Cove core's and Process pinning (but that is beyond the scope of this question)

Code example

Use gcc-12 fp16_FMA_avx512.c -O3 -march=sapphirerapids -mavx512fp16 -o avx512example.bin To generate an executable if your platform supports the instruction

Use gcc-12 fp16_FMA_avx512.c -O3 -march=sapphirerapids -mavx512fp16 -o avx512example.S -S To generate an assembly file that shows the usage of the instructions themselves.

    #include <stdio.h>
    #include <stdlib.h>
    #include <time.h>
    /*
    Simple example of FP16 arithmetic with it's declaration
 
    NB: This uses Clang/GCC convention FP16 declarations due to near universal platform support.
    Any compiler that has yet to formally adopt ISO/IEC TS 18661-3:2015 (“Floating-point extensions for C”) will not support the type.    
    Known working x86_64 compilers as of Feb 08 2022 are:
    Clang/LLVM-14+
    GCC-12+
    Intel ICX Version 2022.0.0
     
    Known working architectures:
        Intel Alder-Lake [        *under certain conditions]
        Intel Sapphire Rapids
*/

int main(){

float seed = 1;
srand((time(0)));
int count = 31;
_Float16 factor = seed;

//primaries
_Float16 a=1.436;
_Float16 b=0.83546;

//arrays to be used for FMA
_Float16 alpha[32];
_Float16 delta[32];
_Float16 omega[32];

while (count>=0)
{
//fill the arrays with differing values
alpha[count]=(_Float16) (a*factor);
delta[count]=(_Float16) (b*factor);
omega[count]=(_Float16) (factor+(a*b));

factor = factor+b;
count--;
}

printf("Print the FMA of 3  _Float16's that are cast as Float\n");

while (count < 32){

omega[count]=(omega[count]*alpha[count])+delta[count];

count++;
}
printf("\n"); //clear last line

while (count>=0)
{
printf("%i %f \n", count, (float) omega[count]);
count--;
}



// 32 entry variable can be used: 512bit/16bits per variable = 32 variables
//c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah
}

Upvotes: 6

Related Questions