What's the point of _mm_cmpgt_sd and other similar methods?

Question

I was looking for a SIMD option to speed up comparisons and I found the function __m128d _mm_cmpgt_sd (__m128d a, __m128d b)

Apparently it compares the lower double, and copies the higher double from a into the output. What it is doing makes sense, but what's the point? What problem is this trying to solve?

wim · Accepted Answer

The point is probably that on very old hardware, such as, e.g., Intel Pentium II and III, _mm_cmpgt_sd() is faster than _mm_cmpgt_pd(). See Agner Fog's instruction tables. These processors (PII and PIII) only have a 64-bit wide floating point unit. 128-bit wide SSE instructions are executed as two 64-bit micro-ops on these processors. On newer CPUs (such as for example intel Core 2 (Merom) and newer) the _pd and _ps versions are as fast as the _sd and _ss versions. So, you might prefer the _sd and _ss versions if you only have to compare a single element and don't care about the upper 64 bits of the result.

Moreover, _mm_cmpgt_pd() may raise a spurious floating point exception or suffer from degraded performance, if the upper garbage bits contain accidently a NaN or a subnormal number, see Peter Cordes’ answer. Although, in practice, it should be easy to avoid such upper garbage bits when programming with intrinsics.

If you want to vectorize your code, and need a packed double compare, then use intrinsic _mm_cmpgt_pd(), instead of _mm_cmpgt_sd().

What's the point of _mm_cmpgt_sd and other similar methods?

Answers (2)

Related Questions

What&#39;s the point of _mm_cmpgt_sd and other similar methods?

Answers (2)

Related Questions

What's the point of _mm_cmpgt_sd and other similar methods?