Reputation: 86333
As the title sais I want to find a successive run of n one-bits in a bit-array of variable size (M).
The usual use-case is N <= 8 and M <= 128
I do this operation a lot in an innerloop on an embedded device. Writing a trivial implementation is easy but not fast enough for my taste (e.g. brute force search until a solution is found).
I wonder if anyone has a more elegant solution in his bag of tricks.
Upvotes: 3
Views: 2275
Reputation: 137780
This can be easily solved, and you don't need a count-zeroes instruction.
y = x ^ x-1
gives you a string of 1's up to the least-significant 1-bit in x
.
y + 1
is the next individual bit which may be 1 or 0, and
x ^ x-(y+1)
gives you a string of 1's from that bit until the next 1-bit.
Then you can multiply the search pattern by (y+1) and recurse…
I'm working on an algorithm to fetch the strings… hold on…
Yeah… easily solved… while I'm working on that, note there's another trick. If you divide a word into substrings of n
bits, then a series of ≥2n-1
1's must cover at least one substring. For simplicity, assume the substrings are 4 bits and words are 32 bits. You can check the substrings simultaneously to quickly filter the input:
const unsigned int word_starts = 0x11111111;
unsigned int word = whatever;
unsigned int flips = word + word_starts;
if ( carry bit from previous addition ) return true;
return ~ ( word ^ flips ) & word_starts;
This works because, after the addition operation, each bit (besides the first) in flips
corresponding to a 1-bit in in word_starts
equals (by the definition of binary addition)
word ^ carry_from_right ^ 1
and you can extract the carry bits by xor
ing with word again, negating, and ANDing. If no carry bits are set, a 1-string won't exist.
Unfortunately, you have to check the final carry bit, which C can't do but most processors can.
Upvotes: 1
Reputation: 10119
This might be a bit over the top for what you are doing but I needed something heavyweight for a custom file system block allocation. If N < 32 then you can remove the second half of the the code.
For backward compatibility the most significant bit of the first word is regarded as bit 0.
Note that the algorithm uses a sentinel word (all zeros) at the end to stop any search rather than continually checking for end of array. Also note that the algorithm allows searching to start from any position in the bit array (typically the end of the last successful allocation) rather than always starting from the beginning of the bit array.
Supply your own compiler specific msbit32() function.
#define leftMask(x) (((int32_t)(0x80000000)) >> ((x) - 1)) // cast so that sign extended (arithmetic) shift used
#define rightMask(x) (1 << ((x) - 1))
/* Given a multi-word bitmap array find a run of consecutive set bits and clear them.
*
* Returns 0 if bitrun not found.
* 1 if bitrun found, foundIndex contains the bit index of the first bit in the run (bit index 0 is the most significant bit of the word at lowest address).
*/
static int findBitRun(int runLen, uint32_t *pBegin, uint32_t *pStartMap, uint32_t *pEndMap, uint32_t *foundIndex)
{
uint32_t *p = pBegin;
unsigned int bit;
if (runLen == 1)
{ // optimise the simple & hopefully common case
do {
if (*p)
{
bit = msbit32(*p);
*p &= ~(1 << bit);
*foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
return 1;
}
if (++p > pEndMap)
{
p = pStartMap;
}
} while (p != pBegin);
}
else if (runLen < 32)
{
uint32_t rmask = (1 << runLen) - 1;
do {
uint32_t map = *p;
if (map)
{
// We want to find a run of at least runLen consecutive ones within the word.
// We do this by ANDing each bit with the runLen-1 bits to the right
// if there are any ones remaining then this word must have a suitable run.
// The single bit case is handled above so can assume a minimum run of 2 required
uint32_t w = map & (map << 1); // clobber any 1 bit followed by 0 bit
int todo = runLen - 2; // -2 as clobbered 1 bit and want to leave 1 bit
if (todo > 2)
{
w &= w << 2; // clobber 2 bits
todo -= 2;
if (todo > 4)
{
w &= w << 4; // clobber 4 bits
todo -= 4;
if (todo > 8)
{
w &= w << 8; // clobber 8 bits
todo -= 8;
}
}
}
w &= w << todo; // clobber any not accounted for
if (w) // had run >= runLen within word
{
bit = msbit32(w); // must be start of left most run
*p &= ~(rmask << ((bit + 1) - runLen));
*foundIndex = ((p - pStartMap) * 32ul) + (31 - bit);
return 1;
}
else if ((map & 1) && (p[1] & 0x80000000ul)) // assumes sentinel at end of map
{
// possibly have a run overlapping two words
// calculate number of bits at right of current word
int rbits = msbit32((map + 1) ^ map);
int lmask = rmask << ((32 + rbits) - runLen);
if ((p[1] | lmask) == p[1])
{
p[0] &= ~((1 << rbits) - 1);
p[1] &= ~lmask;
*foundIndex = ((p - pStartMap) * 32ul) + (32 - rbits);
return 1;
}
}
}
if (++p > pEndMap)
{
p = pStartMap;
}
} while (p != pBegin);
}
else // bit run spans multiple words
{
pEndMap -= (runLen - 1)/32; // don't run off end
if (pBegin > pEndMap)
{
pBegin = pStartMap;
}
do {
if ((p[0] & 1) && ((p[0] | p[1]) == 0xfffffffful)) // may be first word of run
{
uint32_t map = *p;
uint32_t *ps = p; // set an anchor
uint32_t bitsNeeded;
int sbits;
if (map == 0xfffffffful)
{
if (runLen == 32) // easy case
{
*ps = 0;
*foundIndex = (ps - pStartMap) * 32ul;
return 1;
}
sbits = 32;
}
else
{
sbits = msbit32((map + 1) ^ map);
}
bitsNeeded = runLen - sbits;
while (p[1] == 0xfffffffful)
{
if (bitsNeeded <= 32)
{
p[1] = ~(0xfffffffful << (32 - bitsNeeded));
while (p != ps)
{
*p = 0;
--p;
}
*ps &= ~rightMask(sbits);
*foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
return 1;
}
bitsNeeded -= 32;
if (++p == pBegin)
{
++pBegin; // ensure we terminate
}
}
if ((bitsNeeded < 32) & (p[1] & 0x80000000ul))
{
uint32_t lmask = leftMask(bitsNeeded);
if ((p[1] | lmask) == p[1])
{
p[1] &= ~lmask;
while (p != ps)
{
*p = 0;
--p;
}
*ps &= ~rightMask(sbits);
*foundIndex = ((p - pStartMap) * 32ul) + (32 - sbits);
return 1;
}
}
}
if (++p > pEndMap)
{
p = pStartMap;
}
} while (p != pBegin);
}
return 0;
}
Upvotes: 1
Reputation: 6724
If you're on an intel-compatible platform, the BSF (Bit Scan Forward) and BSR (Bit Scan Reverse) asm instructions could help you drop the first and last zero bits. This would be more efficient than the brute-force approach.
Upvotes: 1
Reputation: 43477
int nr = 0;
for ( int i = 0; i < M; ++i )
{
if ( bits[i] )
++nr;
else
{
nr = 0; continue;
}
if ( nr == n ) return i - nr + 1; // start position
}
What do you mean by brute force? O(M*N) or this O(M) solution? if you meant this, then I'm not sure how much more you can optimize things.
It's true we could achieve constant improvements by walking over every byte instead of every bit. This comes to mind: When I say byte I mean a sequence of N bits this time.
for ( int i = 0; i < M; i += N )
if ( bits[i] == 0 ) // if the first bit of a byte is 0, that byte alone cannot be a solution. Neither can it be a solution in conjunction with the previous byte, so skip it.
continue;
else // if the first bit is 1, then either the current byte is a solution on its own or it is a solution in conjunction with the previous byte
{
// search the bits in the previous byte.
int nrprev = 0;
while ( i - nrprev >= 0 && bits[i - nrprev] ) ++nrprev;
// search the bits in the current byte;
int nrcurr = 0;
while ( bits[i + nrcurr + 1] && nrcurr + nrprev <= N ) ++nrcurr;
if ( nrcurr + nrprev >= N ) // solution starting at i - nrprev + 1.
return i - nrprev + 1;
}
Not tested. Might need some additional conditions to ensure correctness, but the idea seems sound.
Upvotes: 4
Reputation: 19620
Unroll the inner loop with a lookup table.
There are four classes of byte:
00000001 - // Bytes ending with one or more 1's. These start a run.
11111111 - // All 1's. These continue a run.
10000000 - // Bytes starting with 1's but ending with 0's. These end a run.
10111000 - // All the rest. These can be enders or short runs.
Make a lookup table that lets you distinguish these. Then process the bit array one byte at a time.
edit
I'd like to be a little less vague about the contents of the lookup table. In specific, I'll suggest that you need three tables, each with 256 entries, for the following characteristics:
Number of bits set.
Number of bits set before first zero.
Number of bits set after last zero.
Depending on how you do it, you may not need the first.
Upvotes: 1
Reputation: 54584
Simple SWAR answer:
Given the value V
you're inspecting, take N
M
-bit-wide registers. For all n
in N
, set register n
to V >> n
.
Dump bitwise AND(all N)
into another M-wide register. Then simply find the bits set in that register and that will be the start of the an all-bits run.
I'm sure if you don't have an M
-bit-wide registers you can adapt this to a smaller register size.
Upvotes: 1
Reputation: 45057
I do something similar on an embedded device running on a MIPS core. The MIPS architecture includes the CLZ
instruction ("Count Leading Zeroes") which will return the number of leading zero-bits for the specified register. If you need to count the leading one-bits, simply invert the data before calling CLZ
.
Example, assuming you have a C-language function CLZ
as an alias for the assembly instruction:
unsigned numbits = 0, totalbits = 0;
while (data != 0 && numbits != N) {
numbits = CLZ(data); // count leading zeroes
data <<= numbits; // shift off leading zeroes
totalbits += numbits; // keep track of how many bits we've shifted off
numbits = CLZ(~data); // count leading ones
data <<= numbits; // shift off leading ones
totalbits += numbits; // keep track of how many bits we've shifted off
}
At the end of this loop, totalbits
will indicate the offset (in bits, from the left) of the first run of N consecutive one-bits. Each line inside the loop can be represented in a single assembly instruction (except the fourth line, which requires a second for the invert operation).
Other non-MIPS architectures may have similar instructions available.
Upvotes: 1