Reputation: 6286
For this code:
#include <stdfloat>
std::bfloat16_t foo(std::float32_t f)
{
return f;
}
GCC generates this code:
foo(_Float32):
sub rsp, 8
call __truncsfbf2
add rsp, 8
ret
Here we see call __truncsfbf2
, which is (?) a software implementation (libgcc/soft-fp/truncsfbf2.c).
It is known that:
Hence, when targeting Cascade Lake I expect to see Intel DL Boost bfloat16 instruction VCVT...
(instead of call __truncsfbf2
).
I've already tried to add -march=cascadelake
. However, GCC still generates call __truncsfbf2
.
Are there any GCC options to generate Intel DL Boost bfloat16 instructions?
The same question goes for Clang.
Upvotes: 1
Views: 53