Reputation: 33
I have just come across a problem, where we supposed to calculate the number of 1's in the binary representation of numbers over a large range. Is there any algorithm or technique to find it easily?
For example,
For input N = 6, the number of 1's in the binary representation of its preceding numbers. Like,
1 - 0001 - No. of 1's = 1;
2 - 0010 - No. of 1's = 1;
3 - 0011 - No. of 1's = 2;
4 - 0100 - No. of 1's = 1;
5 - 0101 - No. of 1's = 2;
Constraints: 1 <= N <= 10 ^ 20
So the total is 7(1+1+2+1+2). Are there any other tricks to find out this? Thanks in advance!
Upvotes: 3
Views: 1942
Reputation: 58201
Let S(n) be the set of numbers 0 to n (with no duplicates, but in any order). Then S(2n+1) = {2*s for s in S(n)} + {2*s+1 for s in S(n)}
, and S(2n) = {2*s for s in S(n)} + {2*s+1 for s in S(n-1)}
.
Two examples:
S(7) = {2*s for s in S(3)} + {2*s+1 for s in S(3)}
= {0, 2, 4, 6} + {1, 3, 5, 7}
S(10) = {2*s for s in S(5)} + {2*s+1 for s in S(4)}
= {0, 2, 4, 6, 8, 10} + {1, 3, 5, 7, 9}
Letting a(n)
be defined to be the total of bits set in all the numbers in S(n)
, and using the formulae for S
, we have a(2n+1) = 2a(n) + n+1
, and a(2n) = a(n) + a(n-1) + n
. That's because the number of bits set in {2*s for s in S(n)}
is the same number as the number of bits set in S(n)
, and the number of bits set in {2*s+1 for s in S(n)}
is the number of bits set in S(n)
plus one for every element of S(n)
(namely: n+1
).
Those same equations appear on https://oeis.org/A000788, credited to Ralf Stephan:
a(0) = 0
a(2n) = a(n)+a(n-1)+n
a(2n+1) = 2a(n)+n+1
Using this, one can write a function B
with B(N) = a(N), a(N-1)
:
def B(N):
if N == 0:
return 0, 0
r, s = B(N//2)
if N % 2:
return 2*r+N//2+1, r+s+N//2
else:
return r+s+N//2, 2*s+N//2
The double return value is a form of dynamic programming, avoiding recomputing the same values multiple times.
The second return value is the one you're interested in. For example:
>> print(B(7)[1])
9
>> print(B(28)[1])
64
>> print(B(10**20)[1])
3301678091638143975424
This obviously runs in O(log N) arithmetic operations, and uses O(log N) stack.
One can reduce the space complexity to O(1) with a little care.
We can write the Ralf Stephan equations in a matrix times vector form:
[ a(2n+1) ] = [2 0 1 1] [ a(n) ]
[ a(2n) ] [1 1 1 0] * [ a(n-1)]
[ 2n+1 ] [0 0 2 1] [ n ]
[ 1 ] [0 0 0 1] [ 1 ]
and
[ a(2n) ] = [1 1 1 0] [ a(n) ]
[ a(2n-1) ] [0 2 1 0] * [ a(n-1)]
[ 2n ] [0 0 2 0] [ n ]
[ 1 ] [0 0 0 1] [ 1 ]
Repeatedly applying one or the other of these rules, gives:
[ a(n) ] = M[0] * M[1] * ... * M[k] * [ a(0) ]
[ a(n-1)] [ a(-1)]
[ n ] [ 0 ]
[ 1 ] [ 1 ]
Where the M[0]
, M[1]
, ..., M[k]
are one or the other of the two 4x4 matrices that appear in the matrix-times-vector versions of the Ralf Stephan equations, depending on the k
th bit of n
.
Thus:
def mat_mul(A, B):
C = [[0] * 4 for _ in range(4)]
for i in range(4):
for j in range(4):
for k in range(4):
C[i][k] += A[i][j] * B[j][k]
return C
M1 = [[2, 0, 1, 1], [1, 1, 1, 0], [0, 0, 2, 1], [0, 0, 0, 1]]
M0 = [[1, 1, 1, 0], [0, 2, 1, 0], [0, 0, 2, 0], [0, 0, 0, 1]]
def B2(N):
M = [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
while N:
M = mat_mul(M, M1 if N%2 else M0)
N >>= 1
return M[1][3]
The function B2
performs O(log n) arithmetic operation, but uses constant space.
We can do a little better, by noting that the M
matrix is always of the form:
[ a b c d ]
[ a-1 b+1 c e ]
[ 0 0 a+b a-1 ]
[ 0 0 0 1 ]
Then, B3
performs the matrix multiplies of B2
in an optimized way, depending on the observed structure of M
:
def B3(N):
a, b, c, d, e = 1, 0, 0, 0, 0
while N:
if N%2:
a, c, d, e = 2*a+b, a+b+2*c, a+c+d, a+c+e-1
else:
b, c = a+2*b, a+b+2*c
N >>= 1
return e
This is as good as this approach can take us: the only arithmetic operations are additions, multiply by two, divide by two, and testing the lowest bit. The space complexity is constant. Even for huge N
(for example, 10^200), the time taken is negligible.
For speed, the C version (using gcc's extension of __int128) computes b3(10**20)
in approximately 140 nanoseconds on my machine. The code is a straightforward conversion of the B3
python function (noting that d
isn't needed), slightly hindered by the lack of multiple assignment in C.
typedef unsigned __int128 uint128;
uint128 b3(uint128 n) {
uint128 a=1, b=0, c=0, e=0;
while (n) {
if (n&1) {
e = a+c+e-1;
c = a+b+2*c;
a = 2*a+b;
} else {
c = a+b+2*c;
b = a+2*b;
}
n >>= 1;
}
return e;
}
Upvotes: 3
Reputation: 13525
Make a table for values 0..2^P-1, where P = 8
byte[] table = new byte[] {0,1,1,2,1,2,1,3, ... 7,8};
and a mask of all units of length P:
long mask = (1 << P)-1;
then, split the input number in bytes and make a sum for each byte:
int numUnits(long number) {
int sum=0;
for (int k=0; k<64/P, k++) {
sum += table[number & mask];
num = num >> P;
}
return sum;
}
Instead of 8, you can take P = 4, or 16, depending of how much memory you can afford for the table.
Upvotes: 0
Reputation: 476557
Yes. Let us first analyze the number of ones between 1 and a power of two 2k (lower bound inclusive, upperbound *exclusive). We will solve the general problem based on this approach later.
Then this means that eventually all bit combinations are picked for the last k bits (except 000
, but this does not contain any set bits). Indeed, for k=3, we see 001
, 010
, 011
, 100
, 101
, 110
and 111
. So on average, half of the bits are set. We thus know that the total number of bits set is:
k
2
---
\ k k-1
/ --- = 2 * k
--- 2
i=0
So for a range between 1 (or 0, but that makes no difference, since 0 has no set bits) and 2k, we have 2k-1×k set bits. For example with k=3, we count 22×3=12 bits which is indeed what we see when we manually enumerate over it.
How does this helps us for the general case?
Say we want to calculate the number of bits set between 0 and l, and 2k<l<2k+1, then we can first count the total number of bits set up to 2k, and then sum this up with the total number of bits set between 2k and l.
Now there is of course still a problem with the latter: since we do not know how to calculate that. But we can perform a "shift": we can calculate the total number of bits between 0 and l-2k (we know how to do that), and add l-2k extra tot that result. We calculate the total number of bits between 0 and l-2k in the same way, we know however that the largest power of two of l-2k will be less than 2k, since 2k was the highest power of two in l, so "progress" is guaranteed.
How does adding l-2k to the result works? Let us take an example: if we want to calculate the number of set bits between 000
and 110
(exclusive), then we have to sum up the bits of 000
, 001
, 010
, 011
, which is the first "iteration". The second iteration is then the bits set between 100
and 110
, we thus do that by performing a shift, and calculating the number of elements between 00
and 10
, but there is an extra bit that is set for every number here in the "original" numbers: the highest set bit, so we count the number of elements over which we iterate, and thus compensate for the loss of bits.
Algorithm: we can now derive an algorithm for that with:
def count_bit_range(n):
if n <= 1:
return 0
k = n.bit_length()-1
pk = 1 << k
pk1 = 1 << (k-1)
return k * pk1 + (n-pk) + count_bit_range(n-pk)
or a non-recursive approach:
def count_bit_range(n):
c = 0
while n > 1:
k = n.bit_length()-1
pk = 1 << k
pk1 = 1 << (k-1)
c += k * pk1 + n - pk
n -= pk
return c
For example:
>>> count_bit_range(0)
0
>>> count_bit_range(1)
0
>>> count_bit_range(2)
1
>>> count_bit_range(3)
2
>>> count_bit_range(4)
4
>>> count_bit_range(5)
5
>>> count_bit_range(6)
7
>>> count_bit_range(12)
20
>>> count_bit_range(28)
64
For example for 12, we get:
0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011
so 20 set bits.
or for 28:
00001 00010 00011 00100 00101 00110 00111
01000 01001 01010 01011 01100 01101 01110 01111
10000 10001 10010 10011 10100 10101 10110 10111
11000 11001 11010 11011
which is indeed 64.
Benchmarks: if we run the algorithm with the upperbound (1020), we obtain 11.9 microseconds on a local machine:
>>> timeit(partial(count_bit_range, 10**20), number=1000000)
11.911393816000782
this is (likely) not the most expensive number in the range however, the number of recurisive calls scales with the number of set bits of the upperbound, and hence the most expensive number in the range is likely (1<<66)-1
:
>>> timeit(partial(count_bit_range, (1<<66)-1), number=1000000)
32.43066442897543
but 32.4 microseconds looks still reasonable to calculate the number of bits set between 1 and 73'786'976'294'838'206'463.
On a local machine, it gives an instant result on the non-recursive approach up to 1020'0000.
time complexity: the number of recursive calls scales with the number of set bits in the upperbound: indeed each iteration the highest set bit is removed, and the algorithm stops when the upperbound hits one or zero. For a w-bit number, this algorithm takes thus O(w) recursive calls (this is not per se the number of "basic operations").
The exact time complexity is a bit harder to calculate, since during a call, we perform a lot of calculations on variables that can, strictly speaking, get arbitrary large, and additions, subtractions, etc. take non-constant time on arbitrary large numbers.
We can assume that most operations will run linear in the number of bits of that number (like the .bit_length()
, and the binary shifts), but multiplication takes more than linear time (in the length). Especially since k has a length that scales logarithmic with the length of the upperbound it is "tricky".
If we assume that the recursive step takes quadratic time in the length of the upperbound (this is likely an overestimate), we thus obtain a time complexity of O(w3), or for a range up to n, a time complexity of O(log3 n).
Upvotes: 1