x86-64 Zero Flag is clearing between inline calls (and another problem)

Question

I am using the bsf x86-64 instruction found on page 210 of Intels developers manual found here. Essentially, if a least significant 1 bit is found, its bit index is stored in the destination operand .

Furthermore, the ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared.

I am compiling my C code with inline x86-64 assembly instructions. I have defined a C function which invokes the bsf instruction:

uint64_t bitScanForward(T_bitboard b) {
    __asm__(
       "bsf %rcx,%rax
"
       "leave
"
       "ret
"
    );
}

and also another C function which checks if the status of the ZF bit in the flag register:

uint64_t isZFSet() {
    printf("
"); <- This is another problem I am having (see below)...
    __asm__(
        "jz true
"
        "movq $0,%rax
"//return false
        "jmp end
"
        "true:
"
        "movq $1,%rax
"//return true
        "end:
"
        "leave
"
        "ret
"
    );
}

I have tested these and found that the ZF flag is always cleared even when the bsf comand is applied to the number zero, seemingly going against the specification.

//Calling function...
//Do stuff...
bitScanForward(0ULL);//ULL is 64 bit on my machine
if(isZFSet()){//ZF flag *should* be set here but its not
   printf("ZF flag is set
");
}
//More stuff...

I suspect the reason the ZF flag is clearing is due to entering and leaving one set of inline instructions to another.

How can I ensure that the flag in the above code is set as specified in the documentation? (I don't want to change much of my code or design)

My "other problem" is that if I dont include the printf statement in the isZFFlagSet, the function seemingly doesnt execute. Totally bizarre. Can anyone explain why?

zwol · Accepted Answer

You are treating an aggressively optimizing C compiler as if it were a macro assembler. That just plain isn't going to work. To get GCC to emit correct code in the presence of assembly inserts, you have to annotate the inserts with complete information about the registers and memory regions that are affected by the assembly code, and you have to use ancillary C statements to mesh them with the surrounding code. Even then, there are things the assembly insert cannot do at all. I urge you to scrap this entire mess and instead use the __builtin_ctzll intrinsic, as suggested in the comments on the question.

Now, to specifics. Your first function is incorrect because GCC does not support use of leave or ret inside an assembly insert. (More generally, assembly inserts may not alter the stack pointer, and may only jump to designated labels within the same function.) The correct way to use bsf from a GCC-style assembly insert is with "extended asm" with input and output operands:

uint64_t bitScanForward(uint64_t b) {
    uint64_t ret;
    asm ("bsf %1, %0" : "=r" (ret) : "r" (b));
    return ret;
}

You must declare a C variable to receive the output of the operation, and explicitly return that variable; having bsf write to %rax would not work (unlike how it was in old MSVC). BSF accepts any two registers as operands, so there is no need to use constraints more specific than r.

Your second function is incorrect because you didn't tell GCC that the condition codes were meaningful after bitScanForward, and because GCC does not support using the condition-code register as an input to an assembly insert. In order to read the ZF output from bsf you must do so within the same assembly insert that invoked bsf:

uint64_t countTrailingZeroes(uint64_t b) {
    uint64_t ret;
    asm ("bsf %1, %0
	"
         "cmove %2, %0"  
         : "=&r" (ret) 
         : "r" (b), "rm" (64));
    return ret;
}

This requires special care -- see how the constraint on operand 0 is now =&r instead of just =r? Without that, GCC is liable to think it can put operand 2 in the same register as operand 0.

Alternatively, you can specify that ZF is an output, which is supported (see the "flag output operands" section of the manual) and then supply a default value from C:

uint64_t countTrailingZeroes(uint64_t b) {
    uint64_t ret;
    int zf;
    asm ("bsf %2, %0"  
         : "=r" (ret), "=@ccz" (zf) : "r" (b));
    if (zf) ret = 64;
    return ret;
}

x86-64 Zero Flag is clearing between inline calls (and another problem)

Answers (1)

Related Questions