Reputation: 199

MinGW gcc set fp rounding mode

I'm using gcc compiler, and I want to be able to fast change the sse rounding mode. The following code works if compile it under linux:

#include <xmmintrin.h>
unsigned int _mxcsr_up = _MM_MASK_MASK | _MM_ROUND_UP;
unsigned int _mxcsr_down = _MM_MASK_MASK | _MM_ROUND_DOWN;
unsigned int _mxcsr_n = _MM_MASK_MASK;

void round_nearest_mode() {
    asm (
    "ldmxcsr %0" : : "m" (_mxcsr_n)
    );
}

void round_up_mode() {
    asm (
    "ldmxcsr %0" : : "m" (_mxcsr_up)
    );
}

void round_down_mode() {
        asm (
        "ldmxcsr %0" : : "m" (_mxcsr_down)
        );
}

But when I compile it under windows using MinGW, the rounding mode is not changed. What is the reason?

Upvotes: 1

Answers (1)

Peter Cordes

Reputation: 363912

The same header that provides the _MM_ROUND_UP constants also defines _mm_setcsr(unsigned int i) and _mm_getcsr(void) intrinsic wrappers around the relevant instructions.

You should normally retrieve the old value, OR or ANDN the bit you want to change, then apply the new value. (e.g. mxcsr &= ~SOME_BITS). You won't find many examples that just use LDMXCSR without doing a STMXCSR first.

Oh, I think you're actually doing that part wrong in your code. I haven't looked at how _MM_MASK_MASK is defined, but its name includes the word MASK. You're ORing it with various other constants, instead of ANDing it. You're probably setting the MXCSR to the same value every time, because you're ORing everything with _MM_MASK_MASK, which I assume has all the rounding-mode bits set.

As @StoryTeller points out, you don't need inline asm or intrinsics to change rounding modes, since the four rounding modes provided by x86 hardware match the four defined by fenv.h in C99: (FE_DOWNWARD, FE_TONEAREST (the default), FE_TOWARDZERO, and FE_UPWARD), which you can set with fesetround(FE_DOWNWARD);.

If you want to change rounding modes on the fly and make sure the optimizer doesn't reorder any FP ops to a place where the rounding mode was set differently, you need
#pragma STDC FENV_ACCESS ON, but gcc doesn't support it. See also this gcc bug from 2008 which is still open: Optimization generates incorrect code with -frounding-math option (#pragma STDC FENV_ACCESS not implemented).

Doing it manually with asm volatile still won't prevent CSE from thinking x/y computed earlier is the same value, though, and not recomputing it after the asm statement. Unless you use x or y as a read-write operand for the asm statement that is never actually used. e.g.

asm volatile("" : "+g"(x));  // optimizer must not make any assumptions about x's value.

You could put the LDMXCSR inside that same inline-asm statement, to guarantee that the point where the rounding mode changed is also the point where the compiler treats x as having changed.

Upvotes: 1

MinGW gcc set fp rounding mode

Answers (1)

Related Questions