Reputation: 2859
This is what I'm currently doing:
int dataLen = 500;
char data[dataLen];
int desired = 1; // between 1 and 6, inclusive
...
char bits[dataLen * 8];
for (int32 j = 0; j < dataLen; j++) {
for (int i = 0; i < 8; i++) {
bits[j*8+i] = ( (data[j] & (1 << (7-i))) ? '1' : '0' );
}
}
int offset = std::search_n(bits, bits + dataLen*8, desired, '0') - bits;
Really nasty, I know, and it's killing performance.
What's the fastest way to find the bit offset of the first set of x
consecutive 0 bits in a char array, where 0 < x < 7
? I'm on GCC with SSE 4.2 so builtins like __builtin_ctz, __builtin_popcountl are an option, I just can't figure out the best way to use them.
Upvotes: 9
Views: 2252
Reputation: 276
I would approach this by taking advantage of x86 little endianness and unaligned memory access capability. Look for candidate words with popcount, then find the position with a loop using __builtin_ctz. This eliminates one loop from your code and avoid the bit search loop if the word isn't a candidate. On big endian machines you would need to use htons(*(unsigned short *)p) to swap the bytes. This requires a machine that allows for unaligned word access. You would also need a fence of on extra 0xFF byte in the end of array.
unsigned char *p;
unsigned short v;
int c;
for (p = data; p < data + sizeof(data); p++)
if (__builtin_popcount(*(unsigned short *)p) <= 16 - desired)
{ for (v = *(unsigned short *)p, c = 0; __builtin_ctz(v) < desired; v>>=1, c++) ;
if (c < 16) // Gotcha @ p starting at bit c
break;
}
if (c == 16 || p >= data + sizeof(data)) // didn't find
else // found
Upvotes: 0
Reputation: 9372
Note these arithmetic tricks:
// remove the trailing one bits
y = x & (x + 1);
x 11100011
+ 1
--------
x+1 11100100
x&(x+1) 11100000
// remove the trailing zero bits
z = y | (y - 1);
y 11100000
- 1
--------
y-1 11011111
y|(y-1) 11111111
// isolate the hole
hole = z - x;
hole = z ^ x;
z 11111111
x 11100011
--------
z^x 00011100
// Now you can count the set bits of the hole.
length = bitcount(hole);
// Or you can make it by computing highbit only (log2).
length = highbit(z^y) - highbit(x^y);
So a possible algorithm would be to use these tricks with large integer arithmetic and loop until length==0 (no more hole) or length>=n (you start next loop with x=z;).
You can emulate the large integer yourself, acting on the collection byte after byte, and stopping when there is no more carry.
This would give something like this:
// return 1-based position of highest bit set in a byte
int highbit(unsigned char c)
{
unsigned char n;
int position = 0;
n = c >> 4;
if( n > 0 ) { c=n; position+=4; };
n = c >> 2;
if( n > 0 ) { c=n; position+=2; };
n = c >> 1;
if( n > 0 ) { c=n; position+=1; };
position += c;
return position;
}
int find_consecutive_zeros( unsigned char *bits , int nbytes , int nzero )
{
int i,nTrailingOnes,nTrailingZero,sizeOfNextHole;
unsigned char x,y;
for(i=0 , x=bits[0]; 1; )
{
// perform y=x&(x+1) to count and remove trailing ones
for(;x==0xFF && i<nbytes-1;x=bits[++i]);
y = x&(x+1);
nTrailingOnes = 8*i + highbit( x^y );
// perform x=y|(y-1); to count and remove trailing zeros
for(;y==0 && i<nbytes-1;y=bits[++i]);
x = y|(y-1);
nTrailingZero = 8*i + highbit( x^y );
sizeOfNextHole = nTrailingZero - nTrailingOnes;
// if size of hole is long enough, return it's low bit rank (0-based)
if( sizeOfNextHole >= nzero ) return nTrailingOnes;
// or return -1 if no more hole
if( sizeOfNextHole == 0 ) return -1;
}
}
You can make it more efficient by using longer than a byte for the base collection...
EDIT: speed up when you have a fixed size for nzero like 6
The algorithm above iterates on all the holes, and may loose time on small holes.
You can avoid that with a precomputed table with small holes filled.
For example 10010101 has 3 holes that are too short, so you can replace it with 11111111.
But you must keep leading and trailing zeros unchanged.
To initialize such a table, you simply take 0xFF and xor with a mask containing 1 bits in place of trailing zeros (x|(x-1))^x
and a mask containing 1 bits in place of leading zeros ((1<<highbit(x))-1)^0xFF
.
Add an exception for 10000001, the sole byte containing 6 zeros between ones.
EDIT2 : I have treated the bit sequence with least significant bit of firt byte first which fits well the arithmetic approach. The question explicitely use most significant bit of first byte first. So I have to reverse the bits to fit the question, which is done while initializing the table...
int reversebits(unsigned char c)
{
c = ((c & 0x0F) << 4) | ((c & 0xF0) >> 4);
c = ((c & 0x33) << 2) | ((c & 0xCC) >> 2);
c = ((c & 0x55) << 1) | ((c & 0xAA) >> 1);
return c;
}
void initializeFillShortHoles(unsigned char fillShortHoles[256])
for(unsigned int x=0;x<256;x++) {
fillShortHoles[reversebits(x)] = ((1<<highbit(x))-1) ^ (x|(x-1)) ^ x;
}
fillShortHoles[0]=0; // no use to reverse bits for those two
fillShortHoles[129]=129;
}
Then you just replace occurrences of x=bits[ i ]
with x=fillShortHoles[ bits[ i ] ]
, and you're done:
int find_6_consecutive_zeros( unsigned char *bits , int nbytes )
{
static unsigned char fillShortHoles[256];
static unsigned char highbitTable[256];
static first=1;
int i,nTrailingOnes,nTrailingZero,sizeOfNextHole;
unsigned char x,y;
if (first)
{
first = 0;
initializeFillShortHoles( fillShortHoles );
for(i=0;i<256;i++) highbitTable[i]=highbit(i);
}
for(x=fillShortHoles[bits[i=0]]; 1; )
{
// perform y=x&(x+1) to count trailing ones
for(;x==0xFF && i<nbytes-1;x=fillShortHoles[bits[++i]]);
y = x&(x+1);
nTrailingOnes = 8*i + highbitTable[ x^y ];
// perform z=y|(y-1); to count trailing zeros
for(;y==0 && i<nbytes-1;y=fillShortHoles[bits[++i]]);
x = y|(y-1);
nTrailingZero = 8*i + highbitTable[ x^y ];
sizeOfNextHole = nTrailingZero - nTrailingOnes;
// if size of hole is long enough, return it's low bit rank (0-based)
if( sizeOfNextHole >= 6 ) return nTrailingOnes;
// or return -1 if no more hole
if( sizeOfNextHole == 0 ) return -1;
}
}
EDIT3: Finally, for nzero<=9, a faster way would be to cache the position for each pair of bytes.
int find_n_consecutive_zeros_bypair( unsigned char *bits , int nbytes , int nzero)
{
static int first=1;
static signed char holepositionbypair[8][65536];
signed char position;
int i;
unsigned short x;
if (first)
{
first = 0;
for(i=0;i<8;i++) {
initializeHolePositionByPair( holepositionbypair[i] , i+1 );
}
}
for (i=0 , x=0xFF; i<nbytes; i++) {
x = (x << 8) + bits[i];
if( (position = holepositionbypair[nzero-1][x]) >= 0) return (i-1)*8 + position;
}
return -1;
}
Note the initialization x=0xFF will handle the cases of nbytes<2.
No matter how you fill the cache holepositionbypair, it will be called only at initialization. I propose the arithmetic way of course:
int highbit16(unsigned short c)
{
unsigned short n;
int position = 0;
n = c >> 8;
if( n ) { c=n; position+=8; };
n = c >> 4;
if( n ) { c=n; position+=4; };
n = c >> 2;
if( n ) { c=n; position+=2; };
n = c >> 1;
if( n ) { c=n; position+=1; };
position += c;
return position;
}
unsigned short reversebits16(unsigned short c)
{
c = ((c & 0x00FF) << 8) | ((c & 0xFF00) >> 8);
c = ((c & 0x0F0F) << 4) | ((c & 0xF0F0) >> 4);
c = ((c & 0x3333) << 2) | ((c & 0xCCCC) >> 2);
c = ((c & 0x5555) << 1) | ((c & 0xAAAA) >> 1);
return c;
}
initializeHolePositionByPair(signed char holepositionbypair[65536],int n)
{
int i,n1,n0;
unsigned short x,y;
signed char position;
for(i=0;i<65536;i++) {
position = -1;
x = i;
while(x != 0xFFFF) {
/* remove trailing ones */
y = x&(x+1);
n1 = highbit16(x^y);
/* remove trailing zeros */
x = y|(y-1);
n0 = highbit16(x^y);
if(n0-n1>=n) {
position = n1; break;
}
}
holepositionbypair[reversebits16(i)] = position;
}
}
Upvotes: 1
Reputation: 6181
Let me try - how about:
class bitIterator{
unsigned char* farray;
int foffset;
public:
bitIterator(unsigned char* array, int offset)
:farray(array), foffset(offset)
{}
bool operator*() const {
return (farray[foffset/8] >> (7 - foffset%8)) & 1;
}
bitIterator& operator++(){
foffset++;
return *this;
}
int offset() const {
return foffset;
}
};
// Just to demonstrate how to call it
int find_first_n(unsigned char* buffer, int length, int n){
return std::search_n(bitIterator(buffer, 0), bitIterator(buffer, length*8), n, 0).offset();
}
That was just for fun...
Now if you really want to squeeze some performance out of it, I will suggset
int find_first_n(unsigned char* buffer, int length, int n){
int prev_trail = 0;
unsigned int* buf = reinterpret_cast<unsigned int*>(buffer);
int len = length/sizeof(int);
// Last few bytes will need special treatment if your array is not multple of int-size length.
// However last bytes should not influence overall performance of code - assuming it is used on rather long arrays.
// If you plan using rather short arrays (20-50 bytes?) you will probably be better off just using plain char.
for (int i=0; i<len; i++){
if (!buf[i]) return i*sizeof(int)*8-prev_trail; // __builtin_clz and __builtin_ctz are undefined on zero ;
// EDIT:
if (!~buf[i]){
prev_trail = 0;
continue;
}
// END EDIT!
int shft = __builtin_clz(buf[i]);
if (shft + prev_trail >= n) return i*sizeof(int)*8-prev_trail; // Leading + previous trailing <= n
// Not enough leading zeros, continue search.
prev_trail = __builtin_ctz(buf[i]); // Store it to use for next int in array
unsigned int tmp =0;
while(shft < sizeof(int)*8-prev_trail){ // While we haven't got to trailing zeros in this uint
tmp = buf[i] << shft; // Shift-out leading zeros;
shft += (__builtin_clz(~tmp));
tmp = buf[i] << shft; // Shift-out leading ones;
lead_zeros = __builtin_clz(tmp);
if (lead_zeros >= n) // and see if have enough leading zeros.
return i*sizeof(int)*8+shft;
shft += lead_zeros;
}
return length*8; // Not found :(
}
This is rather difficult to read, but concept is simple - just iterate over every int-sized block, and for each see if leading zeros + trailing zeros from previous one are >= n. If not just repeatedly shift-out leading zeros and following ones (set bits), and check again for trailing zeroes >= n, as long as we didn't got to trailing bytes.
And just one more idea:
int find_first_n(unsigned char* buffer, int length, int n){
union {
unsigned char chr[2];
unsigned int uit;
};
unsigned int onemask = 0x8000;
for (int i = 1 ; i<n; i++) onemask = onemask | (onemask >> 1);
int* masks = new int[8];
for (int i = 0; i<8; i++){
masks[i]=onemask >> i;
}
// generating masks - eg. for n == 3 would be:
// [ 1110 0000 0000 0000 ]
// [ 0111 0000 0000 0000 ]
// [ 0011 1000 0000 0000 ]
// [ 0001 1100 0000 0000 ]
// [ ... ]
// [ 0000 0001 1100 0000 ]
uit = 0;
for (int i=0; i<length-1; i++){
chr[0] = buffer[i];
chr[1] = buffer[i+1];
for (int j=0; j<8; j++){
if (!(uit & masks[j])) return i*8+j;
}
}
chr[0] = buffer[length-1];
chr[1] = 0xff; // Fill with ones at the end.
for (int j=0; j<8; j++){
if (!(uit & masks[j])) return (length-1)*8+j;
}
return length*8; // Not found :(
}
You might be tempted to cast pointer to variables still in buffer directly to word (reinterpret_cast<unsigned short int*>(buffer[i])
) and apply masks without using union. Note however that half of such operations would use unaligned variables, which can degrade performance, and on some platforms even generate exceptions.
Upvotes: 0
Reputation: 264381
How many numbers have 6 consecutive 0 bits (even when considering 2 consecutive bytes)?
Byte 1
XXXXXXXX
000000?? 0/1/2/3
?000000? 0/1/128/129
??000000 0/64/128/192
So if we consider 1 byte at a time then 0/1/2/3/64/128/192
Byte a Byte b
XXXXXXXX XXXXXXXX
??100000 0??????? (a & 31 == 0) && (b & 128 == 0)
???10000 00?????? (a & 15 == 0) && (b & 192 == 0)
????1000 000????? (a & 7 == 0) && (b & 224 == 0)
?????100 0000???? (a & 3 == 0) && (b & 240 == 0)
??????10 00000??? (a & 1 == 0) && (b & 248 == 0)
So an absolute maximum of 12 tests gives you all combinations.
I am sure if you do the comparisons smartly you can reduce that.
If we steel @Michael Burr solution below for using a table driven approach. Then we can organize it so that you can do one or two comparisons per byte.
struct TestStruct
{
bool is6Consecutive;
bool hasTrailing;
int maskNextByte;
int offset;
};
TestStruct testData[256];
std::size_t findOffsetOf_6ConsecutiveZero(char const* data, size_t size)
{
for(int loop = 0;loop < (size-1); ++loop)
{
char const& val = data[loop];
TestStructure const& test = testData[val];
if (test.is6Consecutive)
{ return loop*8 + test.offset;
}
if (test.hasTrailing)
{
if ((data[loop + 1] & test.maskNextByte) == 0)
{ return loop*8 + test.offset;
}
}
}
// Test last byte
TestStructure const& test = testData[data[size-1]];
if (test.is6Consecutive)
{ return (size-1)*8 + test.offset;
}
return -1;
}
First few table entries:
TestStruct testData[256] =
{
{true, false, 0x00, 0},
{true, false, 0x00, 0},
{true, false, 0x00, 0},
{true, false, 0x00, 0},
{false, true, 0xf0, 6}, // 4 => 00000100 <Next 4 bytes are zero we hit>
{false, false, 0x00, 0}, // 5 => 00000101 <Ignore and move on>
{false, true, 0xf8, 7}, // 6 => 00000110 <Next 5 bytes are zero we hit>
etc...
};
Upvotes: 5
Reputation: 340188
Here's a function that matches the output of the one provided in the question (at least under limited testing). It uses a table lookup , with the table having been generated by a one-off script. I'm honestly not sure if its performance is competitive with the suggestions that use bit testing hackery or GCC builtins, but I'll bet it's not too far off.
struct zeros {
unsigned char leading;
unsigned char internal;
unsigned char trailing;
};
// forward declaration so the long, non-interesting table is at the
// end of this
static struct zeros const zero_table[256];
int find_zero_bits_offset( char const* data, int datalen, int desired)
{
int offset = -1;
int found = 0;
char const* dataptr = &data[0];
char const* endptr = &data[datalen];
// first, find which byte the sequence of zeros begins
while (!found && (dataptr != endptr)) {
unsigned char ch1 = *dataptr++;
unsigned char ch2 = (dataptr != endptr) ? *dataptr : 0xff;
int internal = zero_table[ch1].internal;
int trailing = zero_table[ch1].trailing;
int leading = zero_table[ch2].leading;
found = (desired <= internal) ||
(trailing && (desired <= (trailing + leading)));
}
// now zero in on where the sequence starts within the byte
if (found) {
char ch = 0;
unsigned int mask = 0;
--dataptr;
// dataptr points to the byte where the sequence of zeros starts.
// figure out exactly where the sequence is...
// there's possibly some opportunity for optimization, if neccesary,
// by testing if the sequence was found in the "leading", "internal", or
// "trailing" cases. But the explicit loop will only iterate at most
// 8 times (and will early-out on the first iteration if the match is
// for the "leading" case) that it didn't seem too important
ch = *dataptr;
offset = (dataptr - data) * 8;
// figure out where the appropriate internal run starts
// note that the offset we need to return isn't necessarily the
// offset for the run of zeros counted by zero_table[tmp].internal_offset
// since a sufficient shorter run might come first
// there may be a more efficient bithack for this, but the
// loop will iterate at most 8 times...
mask = ((1 << desired) - 1);
mask <<= (8 - desired);
while ((ch & mask) != 0) {
++offset;
mask >>= 1;
}
}
else {
// however you want to handle the "not found" situation.
// This is equivalent to what was in the question:
offset = (endptr - data) * 8;
}
return offset;
}
static struct zeros const zero_table[256] = {
{ 8, 8, 8 }, // 0000 0000
{ 7, 7, 0 }, // 0000 0001
{ 6, 6, 1 }, // 0000 0010
{ 6, 6, 0 }, // 0000 0011
{ 5, 5, 2 }, // 0000 0100
{ 5, 5, 0 }, // 0000 0101
{ 5, 5, 1 }, // 0000 0110
{ 5, 5, 0 }, // 0000 0111
{ 4, 4, 3 }, // 0000 1000
{ 4, 4, 0 }, // 0000 1001
{ 4, 4, 1 }, // 0000 1010
{ 4, 4, 0 }, // 0000 1011
{ 4, 4, 2 }, // 0000 1100
{ 4, 4, 0 }, // 0000 1101
{ 4, 4, 1 }, // 0000 1110
{ 4, 4, 0 }, // 0000 1111
{ 3, 4, 4 }, // 0001 0000
{ 3, 3, 0 }, // 0001 0001
{ 3, 3, 1 }, // 0001 0010
{ 3, 3, 0 }, // 0001 0011
{ 3, 3, 2 }, // 0001 0100
{ 3, 3, 0 }, // 0001 0101
{ 3, 3, 1 }, // 0001 0110
{ 3, 3, 0 }, // 0001 0111
{ 3, 3, 3 }, // 0001 1000
{ 3, 3, 0 }, // 0001 1001
{ 3, 3, 1 }, // 0001 1010
{ 3, 3, 0 }, // 0001 1011
{ 3, 3, 2 }, // 0001 1100
{ 3, 3, 0 }, // 0001 1101
{ 3, 3, 1 }, // 0001 1110
{ 3, 3, 0 }, // 0001 1111
{ 2, 5, 5 }, // 0010 0000
{ 2, 4, 0 }, // 0010 0001
{ 2, 3, 1 }, // 0010 0010
{ 2, 3, 0 }, // 0010 0011
{ 2, 2, 2 }, // 0010 0100
{ 2, 2, 0 }, // 0010 0101
{ 2, 2, 1 }, // 0010 0110
{ 2, 2, 0 }, // 0010 0111
{ 2, 3, 3 }, // 0010 1000
{ 2, 2, 0 }, // 0010 1001
{ 2, 2, 1 }, // 0010 1010
{ 2, 2, 0 }, // 0010 1011
{ 2, 2, 2 }, // 0010 1100
{ 2, 2, 0 }, // 0010 1101
{ 2, 2, 1 }, // 0010 1110
{ 2, 2, 0 }, // 0010 1111
{ 2, 4, 4 }, // 0011 0000
{ 2, 3, 0 }, // 0011 0001
{ 2, 2, 1 }, // 0011 0010
{ 2, 2, 0 }, // 0011 0011
{ 2, 2, 2 }, // 0011 0100
{ 2, 2, 0 }, // 0011 0101
{ 2, 2, 1 }, // 0011 0110
{ 2, 2, 0 }, // 0011 0111
{ 2, 3, 3 }, // 0011 1000
{ 2, 2, 0 }, // 0011 1001
{ 2, 2, 1 }, // 0011 1010
{ 2, 2, 0 }, // 0011 1011
{ 2, 2, 2 }, // 0011 1100
{ 2, 2, 0 }, // 0011 1101
{ 2, 2, 1 }, // 0011 1110
{ 2, 2, 0 }, // 0011 1111
{ 1, 6, 6 }, // 0100 0000
{ 1, 5, 0 }, // 0100 0001
{ 1, 4, 1 }, // 0100 0010
{ 1, 4, 0 }, // 0100 0011
{ 1, 3, 2 }, // 0100 0100
{ 1, 3, 0 }, // 0100 0101
{ 1, 3, 1 }, // 0100 0110
{ 1, 3, 0 }, // 0100 0111
{ 1, 3, 3 }, // 0100 1000
{ 1, 2, 0 }, // 0100 1001
{ 1, 2, 1 }, // 0100 1010
{ 1, 2, 0 }, // 0100 1011
{ 1, 2, 2 }, // 0100 1100
{ 1, 2, 0 }, // 0100 1101
{ 1, 2, 1 }, // 0100 1110
{ 1, 2, 0 }, // 0100 1111
{ 1, 4, 4 }, // 0101 0000
{ 1, 3, 0 }, // 0101 0001
{ 1, 2, 1 }, // 0101 0010
{ 1, 2, 0 }, // 0101 0011
{ 1, 2, 2 }, // 0101 0100
{ 1, 1, 0 }, // 0101 0101
{ 1, 1, 1 }, // 0101 0110
{ 1, 1, 0 }, // 0101 0111
{ 1, 3, 3 }, // 0101 1000
{ 1, 2, 0 }, // 0101 1001
{ 1, 1, 1 }, // 0101 1010
{ 1, 1, 0 }, // 0101 1011
{ 1, 2, 2 }, // 0101 1100
{ 1, 1, 0 }, // 0101 1101
{ 1, 1, 1 }, // 0101 1110
{ 1, 1, 0 }, // 0101 1111
{ 1, 5, 5 }, // 0110 0000
{ 1, 4, 0 }, // 0110 0001
{ 1, 3, 1 }, // 0110 0010
{ 1, 3, 0 }, // 0110 0011
{ 1, 2, 2 }, // 0110 0100
{ 1, 2, 0 }, // 0110 0101
{ 1, 2, 1 }, // 0110 0110
{ 1, 2, 0 }, // 0110 0111
{ 1, 3, 3 }, // 0110 1000
{ 1, 2, 0 }, // 0110 1001
{ 1, 1, 1 }, // 0110 1010
{ 1, 1, 0 }, // 0110 1011
{ 1, 2, 2 }, // 0110 1100
{ 1, 1, 0 }, // 0110 1101
{ 1, 1, 1 }, // 0110 1110
{ 1, 1, 0 }, // 0110 1111
{ 1, 4, 4 }, // 0111 0000
{ 1, 3, 0 }, // 0111 0001
{ 1, 2, 1 }, // 0111 0010
{ 1, 2, 0 }, // 0111 0011
{ 1, 2, 2 }, // 0111 0100
{ 1, 1, 0 }, // 0111 0101
{ 1, 1, 1 }, // 0111 0110
{ 1, 1, 0 }, // 0111 0111
{ 1, 3, 3 }, // 0111 1000
{ 1, 2, 0 }, // 0111 1001
{ 1, 1, 1 }, // 0111 1010
{ 1, 1, 0 }, // 0111 1011
{ 1, 2, 2 }, // 0111 1100
{ 1, 1, 0 }, // 0111 1101
{ 1, 1, 1 }, // 0111 1110
{ 1, 1, 0 }, // 0111 1111
{ 0, 7, 7 }, // 1000 0000
{ 0, 6, 0 }, // 1000 0001
{ 0, 5, 1 }, // 1000 0010
{ 0, 5, 0 }, // 1000 0011
{ 0, 4, 2 }, // 1000 0100
{ 0, 4, 0 }, // 1000 0101
{ 0, 4, 1 }, // 1000 0110
{ 0, 4, 0 }, // 1000 0111
{ 0, 3, 3 }, // 1000 1000
{ 0, 3, 0 }, // 1000 1001
{ 0, 3, 1 }, // 1000 1010
{ 0, 3, 0 }, // 1000 1011
{ 0, 3, 2 }, // 1000 1100
{ 0, 3, 0 }, // 1000 1101
{ 0, 3, 1 }, // 1000 1110
{ 0, 3, 0 }, // 1000 1111
{ 0, 4, 4 }, // 1001 0000
{ 0, 3, 0 }, // 1001 0001
{ 0, 2, 1 }, // 1001 0010
{ 0, 2, 0 }, // 1001 0011
{ 0, 2, 2 }, // 1001 0100
{ 0, 2, 0 }, // 1001 0101
{ 0, 2, 1 }, // 1001 0110
{ 0, 2, 0 }, // 1001 0111
{ 0, 3, 3 }, // 1001 1000
{ 0, 2, 0 }, // 1001 1001
{ 0, 2, 1 }, // 1001 1010
{ 0, 2, 0 }, // 1001 1011
{ 0, 2, 2 }, // 1001 1100
{ 0, 2, 0 }, // 1001 1101
{ 0, 2, 1 }, // 1001 1110
{ 0, 2, 0 }, // 1001 1111
{ 0, 5, 5 }, // 1010 0000
{ 0, 4, 0 }, // 1010 0001
{ 0, 3, 1 }, // 1010 0010
{ 0, 3, 0 }, // 1010 0011
{ 0, 2, 2 }, // 1010 0100
{ 0, 2, 0 }, // 1010 0101
{ 0, 2, 1 }, // 1010 0110
{ 0, 2, 0 }, // 1010 0111
{ 0, 3, 3 }, // 1010 1000
{ 0, 2, 0 }, // 1010 1001
{ 0, 1, 1 }, // 1010 1010
{ 0, 1, 0 }, // 1010 1011
{ 0, 2, 2 }, // 1010 1100
{ 0, 1, 0 }, // 1010 1101
{ 0, 1, 1 }, // 1010 1110
{ 0, 1, 0 }, // 1010 1111
{ 0, 4, 4 }, // 1011 0000
{ 0, 3, 0 }, // 1011 0001
{ 0, 2, 1 }, // 1011 0010
{ 0, 2, 0 }, // 1011 0011
{ 0, 2, 2 }, // 1011 0100
{ 0, 1, 0 }, // 1011 0101
{ 0, 1, 1 }, // 1011 0110
{ 0, 1, 0 }, // 1011 0111
{ 0, 3, 3 }, // 1011 1000
{ 0, 2, 0 }, // 1011 1001
{ 0, 1, 1 }, // 1011 1010
{ 0, 1, 0 }, // 1011 1011
{ 0, 2, 2 }, // 1011 1100
{ 0, 1, 0 }, // 1011 1101
{ 0, 1, 1 }, // 1011 1110
{ 0, 1, 0 }, // 1011 1111
{ 0, 6, 6 }, // 1100 0000
{ 0, 5, 0 }, // 1100 0001
{ 0, 4, 1 }, // 1100 0010
{ 0, 4, 0 }, // 1100 0011
{ 0, 3, 2 }, // 1100 0100
{ 0, 3, 0 }, // 1100 0101
{ 0, 3, 1 }, // 1100 0110
{ 0, 3, 0 }, // 1100 0111
{ 0, 3, 3 }, // 1100 1000
{ 0, 2, 0 }, // 1100 1001
{ 0, 2, 1 }, // 1100 1010
{ 0, 2, 0 }, // 1100 1011
{ 0, 2, 2 }, // 1100 1100
{ 0, 2, 0 }, // 1100 1101
{ 0, 2, 1 }, // 1100 1110
{ 0, 2, 0 }, // 1100 1111
{ 0, 4, 4 }, // 1101 0000
{ 0, 3, 0 }, // 1101 0001
{ 0, 2, 1 }, // 1101 0010
{ 0, 2, 0 }, // 1101 0011
{ 0, 2, 2 }, // 1101 0100
{ 0, 1, 0 }, // 1101 0101
{ 0, 1, 1 }, // 1101 0110
{ 0, 1, 0 }, // 1101 0111
{ 0, 3, 3 }, // 1101 1000
{ 0, 2, 0 }, // 1101 1001
{ 0, 1, 1 }, // 1101 1010
{ 0, 1, 0 }, // 1101 1011
{ 0, 2, 2 }, // 1101 1100
{ 0, 1, 0 }, // 1101 1101
{ 0, 1, 1 }, // 1101 1110
{ 0, 1, 0 }, // 1101 1111
{ 0, 5, 5 }, // 1110 0000
{ 0, 4, 0 }, // 1110 0001
{ 0, 3, 1 }, // 1110 0010
{ 0, 3, 0 }, // 1110 0011
{ 0, 2, 2 }, // 1110 0100
{ 0, 2, 0 }, // 1110 0101
{ 0, 2, 1 }, // 1110 0110
{ 0, 2, 0 }, // 1110 0111
{ 0, 3, 3 }, // 1110 1000
{ 0, 2, 0 }, // 1110 1001
{ 0, 1, 1 }, // 1110 1010
{ 0, 1, 0 }, // 1110 1011
{ 0, 2, 2 }, // 1110 1100
{ 0, 1, 0 }, // 1110 1101
{ 0, 1, 1 }, // 1110 1110
{ 0, 1, 0 }, // 1110 1111
{ 0, 4, 4 }, // 1111 0000
{ 0, 3, 0 }, // 1111 0001
{ 0, 2, 1 }, // 1111 0010
{ 0, 2, 0 }, // 1111 0011
{ 0, 2, 2 }, // 1111 0100
{ 0, 1, 0 }, // 1111 0101
{ 0, 1, 1 }, // 1111 0110
{ 0, 1, 0 }, // 1111 0111
{ 0, 3, 3 }, // 1111 1000
{ 0, 2, 0 }, // 1111 1001
{ 0, 1, 1 }, // 1111 1010
{ 0, 1, 0 }, // 1111 1011
{ 0, 2, 2 }, // 1111 1100
{ 0, 1, 0 }, // 1111 1101
{ 0, 1, 1 }, // 1111 1110
{ 0, 0, 0 }, // 1111 1111
};
Upvotes: 3
Reputation: 60858
Suppose you're looking for exactly 6 consecutive zeros. You could use a code snippet like this:
unsigned d1 = 0xff;
for (int i = 0; i < dataLen; i += 3) {
unsigned d1 = (d1 << 24) | (data[i] << 16) | (data [i+1] << 8) | (data[i+2]);
unsigned d2 = ~d1; // ones
unsigned d3 = d2 & (d2 << 2) & (d2 << 4);
unsigned d4 = d3 & (d3 << 1);
if (!d4) continue;
doSomethingWithTheSequence(i, d4);
}
d1
will combine the last byte of the previous run with three new bytes. So you can iterate over your data in multiples of 3. You could try multiples of 2, which might be more efficient as you can treat your data as 16 bit atomic quantities. You could also try multiples of 4 and do the subsequent computation on 64 bit numbers, particularly if you're on 64 bit platform. Or you introduce a special case for zero sequences spanning bytes.
d2
reverses the bit pattern, which is useful because shifts will introduce artificial zeros but never artificial ones. d3
looks for three matching ones, at offsets 0, 2 and 4. d4
then adds another bit of offset, thus combining all offsets from 0 through 5. So d4
will be non-zero if and only if there is a run of 6 consecutive zeros in d1
. You can then use __builtin_clz
to identify the location of the most significant one in d4
, which is also the position of the first of these 6 bits in d1
. From that you can get the position in data
.
You can adapt the code to other run lengths, either by adding a loop and hoping that the compiler will optimize it away, or by providing an inline template function which computes d4
from d2
in a way suitable for the desired run length.
Upvotes: 0
Reputation: 37214
For a byte, you only need to do 3 tests:
if( (byte & 0x3F) == 0) { /* Found */ }
if( (byte & 0x7E) == 0) { /* Found */ }
if( (byte & 0xFC) == 0) { /* Found */ }
It should be relatively easy to expand this to wider values. For example, for uint32_t
:
tempValue1 = value & 0x3F3F3F3F;
tempValue2 = value & 0x7E7E7E7E;
tempValue3 = value & 0xFCFCFCFC;
if( ( (uint8_t)tempValue1 == 0) { /* Found */ }
if( ( (uint8_t)tempValue2 == 0) { /* Found */ }
if( ( (uint8_t)tempValue3 == 0) { /* Found */ }
tempValue1 >>= 8;
tempValue2 >>= 8;
tempValue3 >>= 8;
if( ( (uint8_t)tempValue1 == 0) { /* Found */ }
if( ( (uint8_t)tempValue2 == 0) { /* Found */ }
if( ( (uint8_t)tempValue3 == 0) { /* Found */ }
tempValue1 >>= 8;
tempValue2 >>= 8;
tempValue3 >>= 8;
if( ( (uint8_t)tempValue1 == 0) { /* Found */ }
if( ( (uint8_t)tempValue2 == 0) { /* Found */ }
if( ( (uint8_t)tempValue3 == 0) { /* Found */ }
tempValue1 >>= 8;
tempValue2 >>= 8;
tempValue3 >>= 8;
if( ( (uint8_t)tempValue1 == 0) { /* Found */ }
if( ( (uint8_t)tempValue2 == 0) { /* Found */ }
if( ( (uint8_t)tempValue3 == 0) { /* Found */ }
The above code doesn't look like the best way of doing it (because it probably isn't). I deliberate wrote it like this to make it easier to see how it can be done with SSE. For SSE, it'd be similar to above (but wider).
However, for SSE you can do all the comparisons in parallel and get rid of lots of branches. For example, you could AND with each of the 3 masks and use PCMPEQB
3 times and then OR these results together, then do one PMOVMSKB
; which would give you a 16 bit value (that represents 16 bytes - one bit per source byte) that can be tested with a single if(result_flags == 0) { /* None of the 16 bytes matched */ }
, where this final test could be at the end of a "do/while" loop.
Upvotes: 0
Reputation: 1454
Iterate through the array word by word (32-bit or 64-bit depend on your arch). Use __builtin_clz
and __builtin_ctz
to calculate the leading and trailing zeros for each word.
There are two cases of consecutive zeros:
The first case is easy to check. The second case requires to check if leading zeros of this item + trailing zeros of previous item is >= 6.
Upvotes: 2
Reputation: 20980
Here, try this code.
int dataLen = 500;
char data[dataLen];
//Get the data in place with whatever is your algorithm.
int i,j;
unsigned int dataSample;
unsigned int mask;
for(i=0;i<dataLen-1;i++){
dataSample=((unsigned int)(data[i+1]) << 8) | (unsigned int) (data[i]);
mask=(1<<6) - 1 ; //6 consecutive 1's
for(j=0;j<8;j++){
if((dataSample & (mask << j)) == 0){
printf("Found consecutive 6 zeros at byte %d offset %d\n",i,j);
j+=5; // Followed by j++ makes it j+=6.
}
}
}
Upvotes: 0