Reputation: 3
in the past week I struggled with this problem, and it seems I can't handle it finally. Given an arbitrary 64bit unsigned int number, if it contains the binary pattern of 31 (0b11111) at any bit position, at any bit settings, the number is valid, otherwise not.
E.g.:
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 1111 valid
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1110 valid
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0111 1100 valid
0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1000 0000 0000 0000 0000 0000 valid
0000 0000 0011 1110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1110 valid etc...
also:
0000 0000 0000 1100 0000 0100 0000 0000 1100 1100 0000 0000 0100 0000 0001 1111 valid
1110 0000 0000 0100 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0011 1110 valid
0000 0000 1000 0010 0000 0010 0000 0000 0000 0000 0010 0000 0000 0000 0111 1100 valid
0000 0010 0000 0110 0000 0000 0000 0000 0000 1111 1000 0000 0000 0100 0110 0000 valid
0000 0000 0011 1110 0000 0000 0011 0000 0000 1000 0000 0000 0000 0000 0011 1110 valid etc...
but:
0000 0000 0000 1100 0000 0100 0000 0000 1100 1100 0000 0000 0100 0000 0000 1111 invalid
1110 0000 0000 0100 0000 0000 0011 0000 0000 0000 0000 0000 0000 0000 0011 1100 invalid
0000 0000 1000 0010 0000 0010 0000 0000 0000 0000 0010 0000 0000 0000 0101 1100 invalid
0000 0010 0000 0110 0000 0000 0000 0000 0000 1111 0000 0000 0000 0100 0110 0000 invalid
0000 0000 0011 1010 0000 0000 0011 0000 0000 1000 0000 0000 0000 0000 0001 1110 invalid etc...
You've got the point...
But that is just the first half of the problem. The second one is, it needs to be implemented without loops or branches (which was done already) for speed increasing, using only one check by arithmetic and/or logical, bit manipulation kind of code.
The closest I can get, is a modified version of Bit Twiddling Hacks "Determine if a word has a zero byte" ( https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord ) to check five bit blocks of zeros (negated 11111). But it still has the limit of the ability to check only fixed blocks of bits (bit 0 to 4, bit 5 to 9, etc...) not at any bit position (as in the examples above).
Any help would be greatly appreciated, since I'm totally exhausted.
Sz
Upvotes: 0
Views: 158
Reputation: 27340
Let me restate your goal in a slightly different formulation:
I want to check whether an integer contains 5 consecutive high bits.
From this formulation the following solution explains itself. It is written in C++.
bool contains11111(uint64_t i) {
return i & (i << 1) & (i << 2) & (i << 3) & (i << 4);
}
This approach also works for any other pattern. For instance, if you wanted to check for 010
you would use ~i & (i<<1) & ~(i<<2)
. In your example the pattern's length is a prime number, but for composite numbers and especially powers of two you can optimize this even further. For instance, when searching for 1111 1111 you could use i&=i<<1; i&=i<<2; i&=i<<4; return i
.
To test this on your examples I used the following program. The literals inside testcases[]
were generated by running your examples through the bash command ...
{ echo 'ibase=2'; tr -dc '01\n' < fileWithYourExamples; } |
bc | sed 's/.*/UINT64_C(&),/'
#include <cinttypes>
#include <cstdio>
bool contains11111(uint64_t i) {
return i & (i << 1) & (i << 2) & (i << 3) & (i << 4);
}
int main() {
uint64_t testcases[] = {
// valid
UINT64_C(31),
UINT64_C(62),
UINT64_C(124),
UINT64_C(260046848),
UINT64_C(17451448556060734),
UINT64_C(3382101189607455),
UINT64_C(16142027170561130558),
UINT64_C(36593945997738108),
UINT64_C(145804038196167776),
UINT64_C(17451654848708670),
// invalid
UINT64_C(3382101189607439),
UINT64_C(16142027170561130556),
UINT64_C(36593945997738076),
UINT64_C(145804038187779168),
UINT64_C(16325754941866014),
};
for (uint64_t i : testcases) {
std::printf("%d <- %016" PRIx64 "\n", contains11111(i), i);
}
}
This prints
1 <- 000000000000001f
1 <- 000000000000003e
1 <- 000000000000007c
1 <- 000000000f800000
1 <- 003e00000000003e
1 <- 000c0400cc00401f
1 <- e00400300000003e
1 <- 008202000020007c
1 <- 020600000f800460
1 <- 003e00300800003e
0 <- 000c0400cc00400f
0 <- e00400300000003c
0 <- 008202000020005c
0 <- 020600000f000460
0 <- 003a00300800001e
Upvotes: 2