Shift_Left
Shift_Left

Reputation: 1243

Verify that in a range of bits only one is on

I've written this algorithm where three condition bits are evaluated to determine how text should be aligned based on a given address.

7 = Justify right
6 = Justify center
5 = Justify left.

Currently, there isn't a default implemented so if nothing is defined, ZF is returned indicating nothing happened. Text not being on the screen where expected will be an indicator too.

        test    al, 111000000B
        jz      Done

Now this solution works for me because as part of the ABI I'm designing, where CF denotes error. This addresses the need and will work for 16/32/64 bit, but I want to think there is a more efficient way or at least one that isn't so weighty.

    0  0FBC46FE          bsf ax,[bp-0x2]
    4  88C3              mov bl,al
    6  0FBD46FE          bsr ax,[bp-0x2]
    A  28C3              sub bl,al
    C = 12 bytes

As there is special processing for each condition at epilogue, I could catch it there by rotating bit AH into carry and keep doing that until CY and ZF. If there is a carry and non-zero then I know there was more than 1 bit set or I could just ignore extraneous bits and then function would be prioritized by bit position.

Upvotes: 1

Views: 392

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 365792

You might want to redesign this to use 3 or 4 values in a 2-bit field, instead of a bitmap, if they're all mutually exclusive anyway. (See my comments on the question).

After masking off the non-bitmap bits:

One pretty good way is to use the (n & (n - 1)) == 0 bithack to an integer for being a power of 2. (Or actually, to test that it has at most 1 bit set, not exactly 1 bit set, because that expression is true when n == 0).

You can implement that with lea edx, [rax-1] / and edx, eax / add edx, -1, which leaves CF set if eax had more than 1 bit set, otherwise CF is clear. (add edx, -1 carries for every edx value except 0).


On CPUs with BMI1

VEX prefixes are not usable in real-mode or virtual-8086 mode (for compat with some old real-mode conventions of using the same previously-illegal bit pattern as a trap). But they are usable in 16-bit protected mode I think, and definitely in 32 / 64-bit mode. This is relevant because blsr is encoded with a VEX prefix.

There's an extremely efficient way to implement that bithack with blsr (Reset Lowest Set Bit), because it sets flags in a way that's useful for this. Almost as if instruction-set architects knew what they were doing...:

and   eax, 11100000b     ; clear other bits

; requires BMI1
blsr  edx, eax           ; destination = a dummy register, also sets flags

jnz  multiple_bits_were_set       ; and thus edx still has a bit set
jc   input_was_zero

i.e. it does everything you want in one instruction, even using CF in a way that works for your code without a cmc. Your multiple_bits_were_set branch target could just stc and fall into input_was_zero.

There's no JCC that jumps if (CF==1 or ZF==0), only ja (CF=0 and ZF=0) and jbe (CF=1 or ZF=1). I don't think even cmc can help; we'd really need to complement ZF, not CF. (Skylake and later CPUs don't have flag-merging uops or partial flag stalls, and at worst on Haswell it would be a merging uop not a stall, if you were doing something where cmc was useful. Perhaps with a different use-case, since it doesn't seem useful with blsr.)

You might want 2 branches. Or since you're probably branching on which bit was set, maybe only use blsr to check for multiple bits set, and then use cmp al, 01000000b / jae bits_1_or_2 to sort things out into highest or 2nd highest vs. lowest (of the top 3) or none set.


If you want to only branch once for no bits or too many bits:

 AND   eax, mask

 blsr  edx, eax     ; edx = 0 iff no more than 1 bit was set in the input
 ; CF=1 iff eax==0
 adc   edx, 0
 jnz   not_exactly_one_bit_set

The adc can't wrap to zero because blsr always leaves the low bit clear, by definition. edx can only be zero (and thus ZF can only be set) if edx=0 after BLSR, and the adc didn't add anything.

If your bitmask doesn't include the high bit of the register, adc edx,edx would also work, saving one byte. But adc with an immediate 0 may is special-cased to 1 uop even on Haswell, where the general case is 2 uops there. (Later Intel CPUs always have 1 uop adc, earlier Intel CPUs don't have BMI1. AMD CPUs always have 1 uop adc.)

Upvotes: 3

Related Questions