algorithmbinarylanguage-agnosticcomputer-sciencebinary-data

Reputation: 33

Algorithm to find out the number of 1's in the binary representation over a range of positive numbers

I have just come across a problem, where we supposed to calculate the number of 1's in the binary representation of numbers over a large range. Is there any algorithm or technique to find it easily? For example, For input N = 6, the number of 1's in the binary representation of its preceding numbers. Like, 1 - 0001 - No. of 1's = 1; 2 - 0010 - No. of 1's = 1; 3 - 0011 - No. of 1's = 2; 4 - 0100 - No. of 1's = 1;
5 - 0101 - No. of 1's = 2;

Constraints: 1 <= N <= 10 ^ 20

So the total is 7(1+1+2+1+2). Are there any other tricks to find out this? Thanks in advance!

Upvotes: 3

Answers (3)

Paul Hankin

Reputation: 58201

Let S(n) be the set of numbers 0 to n (with no duplicates, but in any order). Then S(2n+1) = {2*s for s in S(n)} + {2*s+1 for s in S(n)}, and S(2n) = {2*s for s in S(n)} + {2*s+1 for s in S(n-1)}.

Two examples:

S(7) = {2*s for s in S(3)} + {2*s+1 for s in S(3)}
     = {0, 2, 4, 6} + {1, 3, 5, 7}

S(10) = {2*s for s in S(5)} + {2*s+1 for s in S(4)}
      = {0, 2, 4, 6, 8, 10} + {1, 3, 5, 7, 9}

Letting a(n) be defined to be the total of bits set in all the numbers in S(n), and using the formulae for S, we have a(2n+1) = 2a(n) + n+1, and a(2n) = a(n) + a(n-1) + n. That's because the number of bits set in {2*s for s in S(n)} is the same number as the number of bits set in S(n), and the number of bits set in {2*s+1 for s in S(n)} is the number of bits set in S(n) plus one for every element of S(n) (namely: n+1).

Those same equations appear on https://oeis.org/A000788, credited to Ralf Stephan:

a(0) = 0
a(2n) = a(n)+a(n-1)+n
a(2n+1) = 2a(n)+n+1

Using this, one can write a function B with B(N) = a(N), a(N-1):

def B(N):
    if N == 0:
        return 0, 0
    r, s = B(N//2)
    if N % 2:
        return 2*r+N//2+1, r+s+N//2
    else:
        return r+s+N//2, 2*s+N//2

The double return value is a form of dynamic programming, avoiding recomputing the same values multiple times.

The second return value is the one you're interested in. For example:

>> print(B(7)[1])
9

>> print(B(28)[1])
64

>> print(B(10**20)[1])
3301678091638143975424

This obviously runs in O(log N) arithmetic operations, and uses O(log N) stack.

Getting to constant space complexity

One can reduce the space complexity to O(1) with a little care.

We can write the Ralf Stephan equations in a matrix times vector form:

[ a(2n+1) ] = [2 0 1 1]   [ a(n)  ]
[ a(2n)   ]   [1 1 1 0] * [ a(n-1)]
[ 2n+1    ]   [0 0 2 1]   [ n     ]
[ 1       ]   [0 0 0 1]   [ 1     ]

and

[ a(2n)   ] = [1 1 1 0]   [ a(n)  ]
[ a(2n-1) ]   [0 2 1 0] * [ a(n-1)]
[ 2n      ]   [0 0 2 0]   [ n     ]
[ 1       ]   [0 0 0 1]   [ 1     ]

Repeatedly applying one or the other of these rules, gives:

[ a(n)  ] = M[0] * M[1] * ... * M[k] *   [ a(0) ]
[ a(n-1)]                                [ a(-1)]
[ n     ]                                [ 0    ]
[ 1     ]                                [ 1    ]

Where the M[0], M[1], ..., M[k] are one or the other of the two 4x4 matrices that appear in the matrix-times-vector versions of the Ralf Stephan equations, depending on the kth bit of n.

Thus:

def mat_mul(A, B):
    C = [[0] * 4 for _ in range(4)]
    for i in range(4):
        for j in range(4):
            for k in range(4):
                C[i][k] += A[i][j] * B[j][k]
    return C

M1 = [[2, 0, 1, 1], [1, 1, 1, 0], [0, 0, 2, 1], [0, 0, 0, 1]]
M0 = [[1, 1, 1, 0], [0, 2, 1, 0], [0, 0, 2, 0], [0, 0, 0, 1]]

def B2(N):
    M = [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]]
    while N:
        M = mat_mul(M, M1 if N%2 else M0)
        N >>= 1
    return M[1][3]

The function B2 performs O(log n) arithmetic operation, but uses constant space.

We can do a little better, by noting that the M matrix is always of the form:

[ a   b   c   d   ]
[ a-1 b+1 c   e   ]
[ 0   0   a+b a-1 ]
[ 0   0   0   1   ]

Then, B3 performs the matrix multiplies of B2 in an optimized way, depending on the observed structure of M:

def B3(N):
    a, b, c, d, e = 1, 0, 0, 0, 0
    while N:
        if N%2:
            a, c, d, e = 2*a+b, a+b+2*c, a+c+d, a+c+e-1
        else:
            b, c = a+2*b, a+b+2*c
        N >>= 1
    return e

This is as good as this approach can take us: the only arithmetic operations are additions, multiply by two, divide by two, and testing the lowest bit. The space complexity is constant. Even for huge N (for example, 10^200), the time taken is negligible.

Fast version in C.

For speed, the C version (using gcc's extension of __int128) computes b3(10**20) in approximately 140 nanoseconds on my machine. The code is a straightforward conversion of the B3 python function (noting that d isn't needed), slightly hindered by the lack of multiple assignment in C.

typedef unsigned __int128 uint128;

uint128 b3(uint128 n) {
    uint128 a=1, b=0, c=0, e=0;
    while (n) {
        if (n&1) {
            e = a+c+e-1;
            c = a+b+2*c;
            a = 2*a+b;
        } else {
            c = a+b+2*c;
            b = a+2*b;
        }
        n >>= 1;
    }
    return e;
}

Upvotes: 3

Alexei Kaigorodov

Reputation: 13525

Make a table for values 0..2^P-1, where P = 8

 byte[] table = new byte[] {0,1,1,2,1,2,1,3, ... 7,8};

and a mask of all units of length P:

 long mask = (1 << P)-1;

then, split the input number in bytes and make a sum for each byte:

int numUnits(long number) {
  int sum=0;
  for (int k=0; k<64/P, k++) {
      sum += table[number & mask];
      num = num >> P;
  }
  return sum;
}

Instead of 8, you can take P = 4, or 16, depending of how much memory you can afford for the table.

Upvotes: 0

willeM_ Van Onsem

Reputation: 476557

Yes. Let us first analyze the number of ones between 1 and a power of two 2^k (lower bound inclusive, upperbound *exclusive). We will solve the general problem based on this approach later.

Then this means that eventually all bit combinations are picked for the last k bits (except 000, but this does not contain any set bits). Indeed, for k=3, we see 001, 010, 011, 100, 101, 110 and 111. So on average, half of the bits are set. We thus know that the total number of bits set is:

 k
2
---
\      k       k-1
/     ---  = 2     * k
---    2
i=0

So for a range between 1 (or 0, but that makes no difference, since 0 has no set bits) and 2^k, we have 2^k-1×k set bits. For example with k=3, we count 2²×3=12 bits which is indeed what we see when we manually enumerate over it.

How does this helps us for the general case?

Say we want to calculate the number of bits set between 0 and l, and 2^k<l<2^k+1, then we can first count the total number of bits set up to 2^k, and then sum this up with the total number of bits set between 2^k and l.

Now there is of course still a problem with the latter: since we do not know how to calculate that. But we can perform a "shift": we can calculate the total number of bits between 0 and l-2^k (we know how to do that), and add l-2^k extra tot that result. We calculate the total number of bits between 0 and l-2^k in the same way, we know however that the largest power of two of l-2^k will be less than 2^k, since 2^k was the highest power of two in l, so "progress" is guaranteed.

How does adding l-2^k to the result works? Let us take an example: if we want to calculate the number of set bits between 000 and 110 (exclusive), then we have to sum up the bits of 000, 001, 010, 011, which is the first "iteration". The second iteration is then the bits set between 100 and 110, we thus do that by performing a shift, and calculating the number of elements between 00 and 10, but there is an extra bit that is set for every number here in the "original" numbers: the highest set bit, so we count the number of elements over which we iterate, and thus compensate for the loss of bits.

Algorithm: we can now derive an algorithm for that with:

def count_bit_range(n):
    if n <= 1:
        return 0
    k = n.bit_length()-1
    pk = 1 << k
    pk1 = 1 << (k-1)
    return k * pk1 + (n-pk) + count_bit_range(n-pk)

or a non-recursive approach:

def count_bit_range(n):
    c = 0
    while n > 1:
        k = n.bit_length()-1
        pk = 1 << k
        pk1 = 1 << (k-1)
        c += k * pk1 + n - pk
        n -= pk
    return c

For example:

>>> count_bit_range(0)
0
>>> count_bit_range(1)
0
>>> count_bit_range(2)
1
>>> count_bit_range(3)
2
>>> count_bit_range(4)
4
>>> count_bit_range(5)
5
>>> count_bit_range(6)
7
>>> count_bit_range(12)
20
>>> count_bit_range(28)
64

For example for 12, we get:

      0001  0010  0011  0100  0101  0110  0111
1000  1001  1010  1011

so 20 set bits.

or for 28:

       00001  00010  00011  00100  00101  00110  00111
01000  01001  01010  01011  01100  01101  01110  01111
10000  10001  10010  10011  10100  10101  10110  10111
11000  11001  11010  11011

which is indeed 64.

Benchmarks: if we run the algorithm with the upperbound (10²⁰), we obtain 11.9 microseconds on a local machine:

>>> timeit(partial(count_bit_range, 10**20), number=1000000)
11.911393816000782

this is (likely) not the most expensive number in the range however, the number of recurisive calls scales with the number of set bits of the upperbound, and hence the most expensive number in the range is likely (1<<66)-1:

>>> timeit(partial(count_bit_range, (1<<66)-1), number=1000000)
32.43066442897543

but 32.4 microseconds looks still reasonable to calculate the number of bits set between 1 and 73'786'976'294'838'206'463.

On a local machine, it gives an instant result on the non-recursive approach up to 10^20'0000.

time complexity: the number of recursive calls scales with the number of set bits in the upperbound: indeed each iteration the highest set bit is removed, and the algorithm stops when the upperbound hits one or zero. For a w-bit number, this algorithm takes thus O(w) recursive calls (this is not per se the number of "basic operations").

The exact time complexity is a bit harder to calculate, since during a call, we perform a lot of calculations on variables that can, strictly speaking, get arbitrary large, and additions, subtractions, etc. take non-constant time on arbitrary large numbers.

We can assume that most operations will run linear in the number of bits of that number (like the .bit_length(), and the binary shifts), but multiplication takes more than linear time (in the length). Especially since k has a length that scales logarithmic with the length of the upperbound it is "tricky".

If we assume that the recursive step takes quadratic time in the length of the upperbound (this is likely an overestimate), we thus obtain a time complexity of O(w³), or for a range up to n, a time complexity of O(log³ n).

Upvotes: 1

Algorithm to find out the number of 1&#39;s in the binary representation over a range of positive numbers

Answers (3)

Getting to constant space complexity

Fast version in C.

Related Questions

Algorithm to find out the number of 1's in the binary representation over a range of positive numbers