Reputation: 41
Just considering what has to be done when migrating some software to cloud. The software uses a lot of SIMD intrinsics (of Intel) starting from SSE3 to AVX. It works well on local server. I am wondering what kind of change should be done to migrate it to cloud. It is certainly desired to be able to use the SIMD functions with changes as few as possible. However, it is seems impossible to predict what kind of CPU will be used when it is running on cloud. And I am doubt if it is possible to be able to use low level features of specific CPU when the soft is running on a kind of virtual machine or container.
Upvotes: 4
Views: 1378
Reputation: 364448
Yes, it's easier to use SIMD on cloud servers than in an application you're going to distribute to people's desktops, because you usually have more control over what hardware your code will run on. (Depending on which cloud hosting, you can know pretty exactly, like running on your current private server.)
Inside a virtual machine, the machine code in your compiled executables is still running natively on (usually) x86 CPUs, usually Intel Xeon but possibly AMD servers.
Some VM software may be set up not to expose AVX, but any x86 cloud host will have SSE4.2 at least. SSE2 is baseline for x86-64, so failing to expose that is not an option. CPUs so old that they only have SSE4.1 or SSSE3 will probably have been retired long ago as not worth the power it takes to run them.
The main thing that's missing from most VMs / cloud hosting is HW performance counters. So you'll have a hard time profiling to tune for the cloud server with Linux perf record
,or perf stat
for any event like cache misses, or even cycles. perf
might have some time-based sampling, and other profiling tools are designed for time-based sampling instead of HW perf counters.
Google Cloud compute servers for example lets you choose what kind of hardware your instances will run on, e.g. Haswell or Skylake-X. With either of those, you have AVX2 and FMA available. (And BMI2, popcnt, etc). With Skylake-X, you also have AVX512BW / AVX512DQ / a few other AVX512 flavours. Compile with clang
/gcc -O3 -march=skylake-avx512
or -march=haswell
as appropriate.
If being able to assume AVX+FMA is important for your software, I assume other cloud hosts have similar mechanisms for letting you pick at least a minimum baseline set of ISA extensions. I'd expect it's very easy to find AVX as a minimum, and probably also easy to find Haswell as a minimum. (AVX2 + FMA + BMI1/BMI2). -march=haswell
is a useful baseline compile target.
VMs support migrating VMs between physical machines, but they will never migrate to a host that drops some features the guest started with. (This is one reason for not passing through AVX, or for advertizing as recent an SSE or AVX version as the CPU has.)
AVX and AVX512 add new architectural state (new / wider registers), and thus require new save/restore support on context switches. Without the right bits set by the OS / VM in control registers, AVX instructions will fault. So a VM can fully stop a guest from using AVX at all. But since SSE2 has to be enabled, they can't stop you from using SSE4.2 if the HW supports it. The guest VM might be set up so CPUID only advertizes SSE2 but no higher, but they can't make SSE4.2 instructions fault while SSE2 instructions work. Same for AVX2+FMA: if AVX1 is enabled, only an underlying CPU that really doesn't support AVX2 or FMA can make them fault, not a CPUID artificial limit. But not advertizing FMA might mean that your VM could migrate at any time to HW that doesn't support it.
Intel still makes CPUs that don't have AVX, in their Silvermont / Goldmont line. Some of these are used in low-power servers, but I think that's rare for most cloud stuff. (Intel also sells Skylake Celeron/Pentium CPUs without AVX, but you won't find those in cloud hosts.)
Other than that, Sandybridge was new in about 2011, and AMD introduce Bulldozer around the same time. So any mainstream CPUs that physically lack AVX support are very much obsolete, and wouldn't have the memory bandwidth and CPU power to be worth the electricity costs for most hosters.
Upvotes: 8