Reputation: 1243
I've written this algorithm where three condition bits are evaluated to determine how text should be aligned based on a given address.
7 = Justify right
6 = Justify center
5 = Justify left.
Currently, there isn't a default implemented so if nothing is defined, ZF is returned indicating nothing happened. Text not being on the screen where expected will be an indicator too.
test al, 111000000B
jz Done
Now this solution works for me because as part of the ABI I'm designing, where CF denotes error. This addresses the need and will work for 16/32/64 bit, but I want to think there is a more efficient way or at least one that isn't so weighty.
0 0FBC46FE bsf ax,[bp-0x2]
4 88C3 mov bl,al
6 0FBD46FE bsr ax,[bp-0x2]
A 28C3 sub bl,al
C = 12 bytes
As there is special processing for each condition at epilogue, I could catch it there by rotating bit AH into carry and keep doing that until CY and ZF. If there is a carry and non-zero then I know there was more than 1 bit set or I could just ignore extraneous bits and then function would be prioritized by bit position.
Upvotes: 1
Views: 392
Reputation: 365792
You might want to redesign this to use 3 or 4 values in a 2-bit field, instead of a bitmap, if they're all mutually exclusive anyway. (See my comments on the question).
After masking off the non-bitmap bits:
One pretty good way is to use the (n & (n - 1)) == 0
bithack to an integer for being a power of 2. (Or actually, to test that it has at most 1 bit set, not exactly 1 bit set, because that expression is true when n == 0
).
You can implement that with lea edx, [rax-1]
/ and edx, eax
/ add edx, -1
, which leaves CF set if eax
had more than 1 bit set, otherwise CF is clear. (add edx, -1
carries for every edx value except 0
).
VEX prefixes are not usable in real-mode or virtual-8086 mode (for compat with some old real-mode conventions of using the same previously-illegal bit pattern as a trap). But they are usable in 16-bit protected mode I think, and definitely in 32 / 64-bit mode. This is relevant because blsr
is encoded with a VEX prefix.
There's an extremely efficient way to implement that bithack with blsr
(Reset Lowest Set Bit), because it sets flags in a way that's useful for this. Almost as if instruction-set architects knew what they were doing...:
and eax, 11100000b ; clear other bits
; requires BMI1
blsr edx, eax ; destination = a dummy register, also sets flags
jnz multiple_bits_were_set ; and thus edx still has a bit set
jc input_was_zero
i.e. it does everything you want in one instruction, even using CF in a way that works for your code without a cmc
. Your multiple_bits_were_set
branch target could just stc
and fall into input_was_zero
.
There's no JCC that jumps if (CF==1 or ZF==0), only ja
(CF=0 and ZF=0) and jbe
(CF=1 or ZF=1). I don't think even cmc
can help; we'd really need to complement ZF, not CF. (Skylake and later CPUs don't have flag-merging uops or partial flag stalls, and at worst on Haswell it would be a merging uop not a stall, if you were doing something where cmc was useful. Perhaps with a different use-case, since it doesn't seem useful with blsr.)
You might want 2 branches. Or since you're probably branching on which bit was set, maybe only use blsr
to check for multiple bits set, and then use cmp al, 01000000b
/ jae bits_1_or_2
to sort things out into highest or 2nd highest vs. lowest (of the top 3) or none set.
AND eax, mask
blsr edx, eax ; edx = 0 iff no more than 1 bit was set in the input
; CF=1 iff eax==0
adc edx, 0
jnz not_exactly_one_bit_set
The adc
can't wrap to zero because blsr
always leaves the low bit clear, by definition. edx
can only be zero (and thus ZF can only be set) if edx=0 after BLSR, and the adc
didn't add anything.
If your bitmask doesn't include the high bit of the register, adc edx,edx
would also work, saving one byte. But adc
with an immediate 0 may is special-cased to 1 uop even on Haswell, where the general case is 2 uops there. (Later Intel CPUs always have 1 uop adc, earlier Intel CPUs don't have BMI1. AMD CPUs always have 1 uop adc.)
Upvotes: 3