Reputation: 911
This is an interview question:
You are given a char variable named ch
, when you know that it represents a number that in its binary form, only one of its eight bits will be equal to '1'. I.E. , the only possible values for ch
are: 0x1, 0x2, 0x4, 0x8, 0x10, 0x20, 0x40, 0x80
.
Given the variable ch
, I need to write the most efficient code to get the index of that '1' bit. For example: if ch == 0x1
-> result is 0. if ch == 0x4
-> result is 2.
The obvious way is to use switch-case, but I need something more efficient.
Is there any bit manipulation you can do here for efficient implementation?
Upvotes: 7
Views: 2137
Reputation: 144695
The simplest solution might not be the fastest, but only profiling against other solutions will let you determine that, and only for a given architecture and compiler.
Here is a very simple solution:
#include <math.h>
int leadingbit(unsigned char c) {
return log2(c);
}
Here is a solution with a lookup table:
int leadingbit(unsigned char c) {
#define N(x) ((076543210 / (x) / (x) / (x)) & 7)
#define N8(x) N(x), N(x+1), N(x+2), N(x+3), N(x+4), N(x+5), N(x+6), N(x+7)
#define N32(x) N8(x), N8(x+8), N8(x+16), N8(x+24)
static unsigned char table[256] = {
N32(0), N32(32), N32(64), N32(96), N32(128), N32(160), N32(192), N32(224),
};
#undef N
#undef N8
#undef N32
return table[c];
}
Here is one inspired by Matt Timmermans without a memory reference:
int leadingbit(unsigned char c) {
int n = c - 1;
n = ((n & 0xAA) >> 1) + (n & 0x55); //sums of pairs of bits
n = ((n & 0xCC) >> 2) + (n & 0x33); //sums of 4s of bits
return ((n >> 4) + n) & 7;
}
Here is one using the non portable builtin_clz()
function (count leading zeroes):
#include <limits.h>
int leadingbit(unsigned char c) {
return CHAR_BIT * sizeof(unsigned) - 1 - builtin_clz((unsigned)c);
}
Note that all of the above assume that c
is a power of 2
, the behavior for other values is potentially undefined. You can check that c
is a power of 2
with a simple expression:
if (c && !(c & (c - 1))) {
/* c is a power of 2 */
}
Upvotes: 0
Reputation: 26085
The type char
may be either signed or unsigned (implementation-defined behavior). In order to safely operate on the value 0x80
we should operate explicitly with unsigned char
data.
I assume that there are no special functions available that give us the bit position more or less directly, such as ffs()
(find first set), clz()
(count leading zeros), or popcount()
(population count), and that we are to determine the bit position using just standard ISO C.
One approach is to spread each bit position in ch
to a separate nibble (a four-bit group), then perform an in-register table lookup, where each table element comprises one nibble in a 32-bit int
.
The expansion can be accomplished by squaring the input twice, which moves bit [i] to bit [4*i]. Code below then uses a special trick to allow extraction of the table element with a multiply and a right shift, where the multiply moves the desired table entry into bits [31:28] of the intermediate result. Note that the table is specified in a readable way and equates to the constant0x01234567
, a substitution every reasonable compiler will make.
Compiler Explorer (Godbolt) shows that most of the execution time cost of uchar_bitpos()
is three dependent integer multiplies plus a couple of other instructions.
This code assumes 8-bit char
and 32-bit int
. For better portability unsigned char
variables could be turned into uint8_t
variables and unsigned int
variables could be turned into uint32_t
variables.
#include <stdio.h>
#include <stdlib.h>
int uchar_bitpos (unsigned char ch)
{
unsigned int ch_pow2, ch_pow4;
const unsigned int table =
((0 << 28) | (1 << 24) | (2 << 20) | (3 << 16) |
(4 << 12) | (5 << 8) | (6 << 4) | (7 << 0));
ch_pow2 = ch * ch;
ch_pow4 = ch_pow2 * ch_pow2;
return (ch_pow4 * table) >> 28;
}
int main (void)
{
unsigned char a = 0x80;
do {
printf ("a = %2x bitpos=%d\n", a, uchar_bitpos (a));
a = a / 2;
} while (a);
return EXIT_SUCCESS;
}
The output of the above program should look as follows:
a = 80 bitpos=7
a = 40 bitpos=6
a = 20 bitpos=5
a = 10 bitpos=4
a = 8 bitpos=3
a = 4 bitpos=2
a = 2 bitpos=1
a = 1 bitpos=0
Upvotes: 3
Reputation: 8534
write the most efficient code to get the index of that '1' bit.
The most efficient code would be to somehow map the value of ch
to its bit index, i.e.:
0x01 -> 0
0x02 -> 1
0x04 -> 2
0x08 -> 3
...
The most simple and naive solution would require a lookup in a mapping table with all possible values of ch
. For 8-bit numbers (char) we need a table with 28= 256 elements:
char naive_table[256];
naive_table[0x01] = 0;
naive_table[0x02] = 1;
naive_table[0x04] = 2;
naive_table[0x08] = 3;
naive_table[0x10] = 4;
naive_table[0x20] = 5;
naive_table[0x40] = 6;
naive_table[0x80] = 7;
The lookup in this table is also simple:
index = naive_table[ch];
The previous solution is simple and fast, but most of the element of naive_table
are wasted. Taking into account that ch
is a power of two, for any n
-bit number there are just n
possible indexes.
So, instead of using a mapping table with 28 elements, we could use a table with just 8 elements and a hash function which would map the value of ch
to a unique index of the mapping table.
The perfect candidate for such a hash function would be a function using the de Bruijn sequence. There is a paper "Using de Bruijn Sequences to Index a 1 in a Computer Word" which states:
A
length-n
de Bruijn sequence, wheren
is an exact power of 2, is a cyclic sequence of n 0's and 1's such that every 0-1 sequence of lengthlg n
occurs exactly once as a contiguous substring.For example, a length-8 de Bruijn sequence is 00011101. Each 3-bit number occurs exactly once as a contiguous substring: starting from the leftmost 3 bits and moving a 3-bit window right one bit at a time, we have 000, 001, 011, 111, 110, 101, 010 (wrapping around), 100 (also wrapping around).
The hash function is computed by: h(x)=(x * deBruijn)>>(n - lg n)
So, let us try this hash function to get a unique index in our compact lookup table:
h(ch) = ((ch * 00011101b) >> (8 - 3)) & 0x7
h(ch) = ((ch * 29) >> 5) & 0x7
Let us calculate the hashes for all values of ch
and make sure the hash function works as expected, i.e. all the hashes are unique:
ch h(ch)
0x01 ((1 * 29) >> 5) & 0x7 = 0
0x02 ((2 * 29) >> 5) & 0x7 = 1
0x04 ((4 * 29) >> 5) & 0x7 = 3
0x08 ((8 * 29) >> 5) & 0x7 = 7
0x10 ((16 * 29) >> 5) & 0x7 = 6
0x20 ((32 * 29) >> 5) & 0x7 = 5
0x40 ((64 * 29) >> 5) & 0x7 = 2
0x80 ((64 * 29) >> 5) & 0x7 = 4
So the hash function works fine and produces unique hashes for each power of two value of ch
.
Now let us create a compact mapping table using the hash values from the table above:
char compact_table[8];
compact_table[0] = 0;
compact_table[1] = 1;
compact_table[3] = 2;
compact_table[7] = 3;
compact_table[6] = 4;
compact_table[5] = 5;
compact_table[2] = 6;
compact_table[4] = 7;
Now for the lookup we use a hash value as an index:
h = ((ch * 29) >> 5) & 0x7;
index = compact_table[h];
The previous version is nearly perfect: there are no more wasted elements in the mapping table. But since all the indexes are within 0-7 (i.e. just 3-bit values), there is still a room for improvement. Let us use a bit string instead of the mapping table so the most significant bits of each element are not wasted.
First, let us create such a bit string using all the values of ch
and the hash values from the previous version:
ch h(sh) index
0x01 0 0 (000b)
0x02 1 1 (001b)
0x04 3 2 (010b)
0x08 7 3 (011b)
0x10 6 4 (100b)
0x20 5 5 (101b)
0x40 2 6 (110b)
0x80 4 7 (111b)
Now let us order this table by the hash value:
ch h(sh) index
0x01 0 0 (000b)
0x02 1 1 (001b)
0x40 2 6 (110b)
0x04 3 2 (010b)
0x80 4 7 (111b)
0x20 5 5 (101b)
0x10 6 4 (100b)
0x08 7 3 (011b)
So the bit string will be a reversed concatenation of those 3-bit indexes:
011 100 101 111 010 110 001 000 = 0x72f588
Now let us lookup in this bit string just like we did previously. Note that our indexes are 3-bit, so we need to multiply our hash value by 3:
h = ((ch * 29) >> 5) & 0x7; // just like before
bit_string = 0x72f588;
index = (bit_string >> (h * 3)) & 0x7;
Or in short:
index = (0x72f588 >> ((((ch * 29) >> 5) & 0x7) * 3)) & 0x7;
There are no divisions/modulos/conditions in the code, so it should perform fast on any CPU.
The prove of concept code:
unsigned char ch;
for (ch = 1; ch; ch <<= 1) {
int index = (0x72f588 >> ((((ch * 29) >> 5) & 7) * 3)) & 7;
printf("ch = 0x%02x index = %d\n", ch, index);
}
return 0;
Upvotes: 2
Reputation: 71
If you have only one bit set to 1
, that means it is a power of 2
. You can directly get the index by taking log
of ch
. You have to use 2 based log of course.
Upvotes: 0
Reputation: 3930
A fast and quite portable solution is:
int charindex(unsigned char c){
union { /* Assume both float and int are 32 bits, assume IEEE 754 floating point. */
int i;
float f;
} x;
x.f = (float)c;
return (x.i >> 23) - 127;
}
Note that many processors have hardware support for counting the number of leading or trailing zeros
of an integer. With gcc it is easy to access these particular instructions: gcc has the builtin function __builtin_ctz()
which is probably more efficient than charindex
on platforms with suitable hardware support.
Upvotes: 1
Reputation: 3315
A few methods, that are not going to be hyper efficient( depending on your definition of efficiency).
Loop and shift method.
int ch = 32
int i;
for ( i=1;ch >>i ; i++)
printf("%i %i \n",i, ch>>i);
printf("Final index:%i\n",i-1);
Calling math.h log2
int l=log2((double)ch);
printf("math log2:%i\n",l);
More efficient: For a single lookup it is probably difficult to beat AnT's version. But for repeated lookups, a lookup table might perform better.
int ltable[256]= { -1 };
void initTable()
{
ltable[0x01]=0;
ltable[0x02]=1;
ltable[0x04]=2;
ltable[0x08]=3;
ltable[0x10]=4;
ltable[0x20]=5;
ltable[0x40]=6;
ltable[0x80]=7;
}
int lookup(size_t ch)
{
return ltable[ch];
}
Table init ASM
init():
push rbp
mov rbp, rsp
mov DWORD PTR ltable[rip+4], 0
mov DWORD PTR ltable[rip+8], 1
mov DWORD PTR ltable[rip+16], 2
mov DWORD PTR ltable[rip+32], 3
mov DWORD PTR ltable[rip+64], 4
mov DWORD PTR ltable[rip+128], 5
mov DWORD PTR ltable[rip+256], 6
mov DWORD PTR ltable[rip+512], 7
nop
pop rbp
ret
Table lookup ASM
lookup(unsigned long):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR ltable[0+rax*4]
pop rbp
ret
Outputs
1 16
2 8
3 4
4 2
5 1
Final index:5
math log2:5
Lookup[32]=>5
Upvotes: 0
Reputation: 320401
An unsigned char
variable is supposedly only 8 bit wide. In order to encode the position of the bit we need only 3 bits. That means that we can build a 24-bit "table" that contains all 8 of possible 3-bit answers in their natural order
111 110 101 100 011 010 001 000 =
0xFAC688
If your variable ch
is known to contain only one 1
bit, then it is a power of 2. Dividing something by ch
will right-shift the original value by the index of your 1
bit. So, if we divide the above "table" by your ch
three times the answer will get shifted to the lowest 3 bits of the result
unsigned position = (0xFAC688 / ch / ch / ch) & 0x7;
End of story. The above could probably be rewritten more efficiently, while preserving the general principle.
Note, that this is basically the same principle that's used in the approaches based on De Bruijn sequences. However, the purpose of De Bruijn sequence is to pack the index table in situations when the original "unpacked" table (like my table above) does not fit into an integer. As an "unpleasant" side effect, De Bruijn sequence reorders the index table, breaking the original natural sequence of indices. This requires extra re-mapping efforts to extract the proper result from the De Bruijn sequence.
With only 24 bits we don't have this problem here, which means that there's no need to involve De Bruijn and its accompanying tricks.
On the other hand, a packed table requires a shorter shift, which will simplify (and thus optimize) the calculation of the divisor to achieve the desired shift's length. In case of De Bruijn sequence, there's no need to calculate the divisor at all - your ch
is already it. So, De Bruijn sequence might easily end up being more efficient.
Upvotes: 4
Reputation: 20027
Some architectures contain efficient (single instruction) implementation of popcount
, available in C-compilers through intrinsics or __builtin_popcount()
.
If this is the case, it will be hard to beat popcount(x - 1)
, which will first convert the single set bit (1 << n) to a run of bits from (1 << (n-1)) .. 1, or 0 when x==1, then count the number of ones, which is the index of the original n.
Some comments point out ”Bit Scan Forward”, however, at least in x86 architectures that is inferior to popcount. Always know your HW...
Upvotes: 0
Reputation: 13924
You can use binary search technique here to reduce the number of comparison from 7 to 3.
assert((n & n-1) == 0);
if(n & 0x0F) {
if(n & 0x03){
if(n & 0x01){
idx = 0;
}
else{
idx = 1;
}
}else{
if(n & 0x04){
idx = 2;
}
else{
idx = 4;
}
}
}else{
if(n & 0x30){
if(n & 0x10){
idx = 3;
}
else{
idx = 4;
}
}else{
if(n & 0x40){
idx = 5;
}
else{
idx = 6;
}
}
}
Upvotes: 0
Reputation: 59154
Well, if ch
has a single bit set, then the count of 1 bits in ch-1
is the index of that bit. Ideally, you'd want to find that without looping or branching, since branches are expensive, so I'd write that something like this:
int index = ((unsigned char)ch)-1;
index = ((index & 0xAA)>>1)+(index & 0x55); //sums of pairs of bits
index = ((index & 0xCC)>>2)+(index & 0x33); //sums of 4s of bits
index = ((index & 0xF0)>>4)+(index & 0x0F); //sum of 8 bits
There is also an extremely clever answer using fewer operations at the cost of a multiplication and a lookup:
int index = indexMap[((((int)(unsigned char)ch)*DEBRUIJN)>>16)&7];
The bits in DEBRUIJN must be a De Bruijn sequence (https://en.wikipedia.org/wiki/De_Bruijn_sequence), ensuring that lookup index will be different for every value of ch
. indexMap
maps those lookup indexes to the results you want.
Note also that, following @rici's comment, indexMap
is so small that you can pack it into a single int.
Upvotes: 2
Reputation: 71
Number of code lines efficient could be a linear search through the bits.
short bit=0;
const char one=1;
while(!((ch >> bit) & one)) ++bit;
Of course error checking is probably a good idea so you could also add a check to make sure you are still in a valid bit.
short bit=0;
const char one=1;
while(++bit < 8 && !((ch >> bit) & one)) {}
It definitely isn't as computationally efficient, and it would fail to detect when more than one bit was set so the switch case is still probably the way to go for correctness.
This guy has fewer jumps in the assembly than a switch case does so maybe it is more efficient in computing the bit.
short bit=
ch&0x2?1:
(ch&0x4?2:
(ch&0x8?3:
(ch&0x10?4:
(ch&0x20?5:
(ch&0x40?6:
(ch&0x80?7:8))))));
You could skip checking the last bit too and assume if nothing else matches the its the 7th bit is set which could save one comparison.
short bit=
ch&0x2?1:
(ch&0x4?2:
(ch&0x8?3:
(ch&0x10?4:
(ch&0x20?5:
(ch&0x40?6:7)))));
Upvotes: 0