gcc inline asm x86 CPU flags as input dependency

Question

I want to create a function for addition two 16-bit integers with overflow detection. I have generic variant written in portable c. But the generic variant is not optimal for x86 target, because CPU internally calculate overflow flag when execute ADD/SUB/etc. Of course, there is__builtin_add_overflow(), but in my case it generates some boilerplate. So I write the following code:

#include 

struct result_t
{
    uint16_t src;
    uint16_t dst;
    uint8_t  of;
};

static void add_u16_with_overflow(result_t& r)
{
    char of, cf;
    asm (
        " addw %[dst], %[src] " 
        : [dst] "+mr"(r.dst)//, "=@cco"(of), "=@ccc"(cf)
        : [src] "imr" (r.src) 
        : "cc"
        );

    asm (" seto %0 " : "=rm" (r.of) );

}

uint16_t test_add(uint16_t a, uint16_t b)
{
    result_t r;
    r.src = a;
    r.dst = b;
    add_u16_with_overflow(r);
    add_u16_with_overflow(r);

    return (r.dst + r.of); // use r.dst and r.of for prevent discarding
}

I've played with https://godbolt.org/g/2mLF55 (gcc 7.2 -O2 -std=c++11) and it results

test_add(unsigned short, unsigned short):
  seto %al 
  movzbl %al, %eax
  addw %si, %di 
  addw %si, %di 
  addl %esi, %eax
  ret

So, seto %0 is reordered. It seems gcc think there is no dependency between two consequent asm() statements. And "cc" clobber doesn't have any effect for flags dependency.

I can't use volatile because seto %0 or whole function can be (and have to) optimized out if result (or some part of result) is not used.

I can add dependency for r.dst: asm (" seto %0 " : "=rm" (r.of) : "rm"(r.dst) );, and reordering will not happen. But it is not a "right thing", and compiler still can insert some code changes flags (but not changes r.dst) between add and seto statement.

Is there way to say "this asm() statement change some cpu flags" and "this asm() use some cpu flags" for dependency between statement and prevent reordering?

gcc inline asm x86 CPU flags as input dependency

Answers (1)

Related Questions