Reputation: 187
I need to search for a 8-bit aligned bit string "00 00 01" (hex) in a character stream. On a typical architecture, I would do it as such:
char *find(char *first, char *last)
{
char pattern[] = {0, 0, 1};
char *p;
for (p = first; last - p >= sizeof(pattern); ++p) {
if (!memcmp(p, pattern, sizeof(pattern))
return p;
}
return 0;
}
However, I don't know how I would implement this function (with good performance) if char was not 8-bits.
Upvotes: 2
Views: 228
Reputation: 408
The task is quite interesting one, so I'm coming with another option. It doesn't require to convert your stream bits into chars, instead we could use following pattern.
Since your bit values shall have 8 bit alignment, possible char index / it's start bit options are:
char 0, bit 0 (its starting bit index)
char 0, bit 8
char 1, bit 7
char 2, bit 6
char 3, bit 5
char 4, bit 4
char 5, bit 3
char 6, bit 2
char 7, bit 1
for the char 8, the starting bit would be 0, so it same as 1st item (char 0, bit 0)
Now except first position, remaining 8 possibilities are easy to verify by single expression:
The pseudo code:
int pattern = 0x000001L;
int mask = ~pattern;
int char_idx = 0;
while (first <= last-2) // need to compare 3 chars
{
int value = *((int*)first)); // this will actually access 4 chars, if stream has no 0 terminator, it will produce exception
// special [char 0, bit 0] case
if ( !char_idx && (value & mask) == pattern )
{
// match! do something with *first
}
if ( ((value >> (8 - char_idx)) & mask) == pattern )
{
// match! do something with *first
}
if ( ++char_idx == 9 )
char_idx = 0;
first++;
}
NOTE: if your int is not 36 bit, you can do char-by-char comparision
Upvotes: 2
Reputation: 11018
The following code should work under the following conditions:
long
is 36 bits (4 bytes of 9 bits each)long
value are stored at the lowest addresslong *
can point to any address, not necessarily a multiple of four; in other words, no word or dword alignmentThe function does not return a char *
since that doesn't say much about the actual bit position. Instead, it returns the number of 8-bit groups that precede the match, or -1 if no match.
long find(char *first, char *last)
{
long pattern = 0x000001L; // the bit string we are searching for
long bitmask = -0x1000L; // initial mask: 24 ones followed by 12 zeroes
long maxcount = ((last - first) * 9 - 24) / 8; // 24 = pattern size (bits)
long count; // counts the 8-bit groups
char *slider = first; // follows the 9-bit bytes
for (count = 0; count <= maxcount; count++) {
long actual = (*(long *)slider & bitmask);
long expect = (bitmask & -bitmask) * pattern;
if (actual == expect) return count;
if (bitmask & 0xFF) { // less than 8 zeroes on the right-hand side
slider++;
bitmask <<= 1; // shift 9 bits to left, then 8 bits to right
}
else {
bitmask >>= 8; // shift 8 bits to the right, only
}
}
return -1;
}
I have no idea how to test this, so it's on an 'as is' basis.
The function uses a bitmask
with exactly 24 ones. The bits are continuously shifted 8 positions to the right. If a '1' threatens to be shifted out, then the memory pointer slider
is incremented and the bitmask
is adjusted accordingly.
slider
is defined as char *
, and cast to long *
when being dereferenced, retrieving four 9-bit bytes in one go. If I would define slider
as long *
, then slider++
would proceed the pointer by 4 bytes instead of one.
Here's an example to explain this obscure expression: (bitmask & -bitmask) * pattern
bitmask
-bitmask
(bitmask & -bitmask)
(bitmask & -bitmask) * pattern
As you can see, it aligns pattern
(the 24 bits pppppppppppppppppppppppp
) with the bitmask
.
Please let me know how this works out for you.
Upvotes: 1
Reputation: 408
What the hell is that, the 9-bit architecture? :) So the 'char' type is also 9 bit? :)
The quick and dirty way would be to convert char stream into bit-to-char representation, in other words every char representing a bit. Then just search for substring of "000001", with the alignment of 8 chars (memcmp at first[0], then memcp at first[8] etc)... Surely it is possible to do it in a 'binary'/right way, but depending on how long the stream is this could be the 'good performance' way...
Upvotes: 0