Matt Howells
Matt Howells

Reputation: 41266

Count the number of set bits in a 32-bit integer

8 bits representing the number 7 look like this:

00000111

Three bits are set.

What are the algorithms to determine the number of set bits in a 32-bit integer?

Upvotes: 1024

Views: 665208

Answers (30)

Erorr
Erorr

Reputation: 1

I think the Brian Kernighan's method will be useful too... It goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.

/** count the number of bits set in n */
int countSetBits(unsigned int n) {
    unsigned int c; // c accumulates the total bits set in n
    for (c=0;n>0;n=n&(n-1)) c++; 
    return c; 
}

Published in 1988, the C Programming Language 2nd Ed. (by Brian W. Kernighan and Dennis M. Ritchie) mentions this in exercise 2-9. On April 19, 2006 Don Knuth pointed out to me that this method "was first published by Peter Wegner in CACM 3 (1960), 322. (Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.)"

Upvotes: 12

Robert S. Barnes
Robert S. Barnes

Reputation: 40548

Here is a portable module ( ANSI-C ) which can benchmark each of your algorithms on any architecture.

Your CPU has 9 bit bytes? No problem :-) At the moment it implements 2 algorithms, the K&R algorithm and a byte wise lookup table. The lookup table is on average 3 times faster than the K&R algorithm. If someone can figure a way to make the "Hacker's Delight" algorithm portable feel free to add it in.

#ifndef _BITCOUNT_H_
#define _BITCOUNT_H_

/* Return the Hamming Weight of val, i.e. the number of 'on' bits. */
int bitcount( unsigned int );

/* List of available bitcount algorithms.  
 * onTheFly:    Calculate the bitcount on demand.
 *
 * lookupTable: Uses a small lookup table to determine the bitcount.  This
 * method is on average 3 times as fast as onTheFly, but incurs a small
 * upfront cost to initialize the lookup table on the first call.
 *
 * strategyCount is just a placeholder. 
 */
enum strategy { onTheFly, lookupTable, strategyCount };

/* String represenations of the algorithm names */
extern const char *strategyNames[];

/* Choose which bitcount algorithm to use. */
void setStrategy( enum strategy );

#endif

.

#include <limits.h>

#include "bitcount.h"

/* The number of entries needed in the table is equal to the number of
 * unique values a char can represent which is always UCHAR_MAX + 1 */
static unsigned char _bitCountTable[UCHAR_MAX + 1];
static unsigned int _lookupTableInitialized = 0;

static int _defaultBitCount( unsigned int val ) {
    int count;

    /* Starting with:
     * 1100 - 1 == 1011,  1100 & 1011 == 1000
     * 1000 - 1 == 0111,  1000 & 0111 == 0000
     */
    for ( count = 0; val; ++count )
        val &= val - 1;

    return count;
}

/* Looks up each byte of the integer in a lookup table.
 *
 * The first time the function is called it initializes the lookup table.
 */
static int _tableBitCount( unsigned int val ) {
    int bCount = 0;

    if ( !_lookupTableInitialized ) {
        unsigned int i;
        for ( i = 0; i != UCHAR_MAX + 1; ++i )
            _bitCountTable[i] =
                ( unsigned char )_defaultBitCount( i );

        _lookupTableInitialized = 1;
    }

    for ( ; val; val >>= CHAR_BIT )
        bCount += _bitCountTable[val & UCHAR_MAX];

    return bCount;
}

static int ( *_bitcount ) ( unsigned int ) = _defaultBitCount;

const char *strategyNames[] = { "onTheFly", "lookupTable" };

void setStrategy( enum strategy s ) {
    switch ( s ) {
    case onTheFly:
        _bitcount = _defaultBitCount;
        break;
    case lookupTable:
        _bitcount = _tableBitCount;
        break;
    case strategyCount:
        break;
    }
}

/* Just a forwarding function which will call whichever version
 * of the algorithm has been selected by the client 
 */
int bitcount( unsigned int val ) {
    return _bitcount( val );
}

#ifdef _BITCOUNT_EXE_

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/* Use the same sequence of pseudo random numbers to benchmark each
 * Hamming Weight algorithm.
 */
void benchmark( int reps ) {
    clock_t start, stop;
    int i, j;
    static const int iterations = 1000000;

    for ( j = 0; j != strategyCount; ++j ) {
        setStrategy( j );

        srand( 257 );

        start = clock(  );

        for ( i = 0; i != reps * iterations; ++i )
            bitcount( rand(  ) );

        stop = clock(  );

        printf
            ( "\n\t%d pseudo-random integers using %s: %f seconds\n\n",
              reps * iterations, strategyNames[j],
              ( double )( stop - start ) / CLOCKS_PER_SEC );
    }
}

int main( void ) {
    int option;

    while ( 1 ) {
        printf( "Menu Options\n"
            "\t1.\tPrint the Hamming Weight of an Integer\n"
            "\t2.\tBenchmark Hamming Weight implementations\n"
            "\t3.\tExit ( or cntl-d )\n\n\t" );

        if ( scanf( "%d", &option ) == EOF )
            break;

        switch ( option ) {
        case 1:
            printf( "Please enter the integer: " );
            if ( scanf( "%d", &option ) != EOF )
                printf
                    ( "The Hamming Weight of %d ( 0x%X ) is %d\n\n",
                      option, option, bitcount( option ) );
            break;
        case 2:
            printf
                ( "Please select number of reps ( in millions ): " );
            if ( scanf( "%d", &option ) != EOF )
                benchmark( option );
            break;
        case 3:
            goto EXIT;
            break;
        default:
            printf( "Invalid option\n" );
        }
    }

 EXIT:
    printf( "\n" );

    return 0;
}
#endif

Upvotes: 5

Mayukh Pankaj
Mayukh Pankaj

Reputation: 181

Counting set bits in binary representation (N):

Pseudocode -

  1. set counter = 0.

  2. repeat counting while N is not zero.

    1. check last bit.
        if last bit = 1, increment counter
    1. Discard last bit of N.

Now let's code this in C++

int countSetBits(unsigned int n){
    int count = 0;

    while(n!=0){
        count += n&1;
        n = n >>1;
    }

    return count;
}

Let's use this function.

int main(){
    int x = 5;

    cout<<countSetBits(x);

    return 0;
}

Output: 2

Because 5 has 2 bits set in binary representation (101).

You can run the code here.

Upvotes: 0

cipilo
cipilo

Reputation: 131

I have not seen this approach anywhere:

int nbits(unsigned char v) {
    return ((((v - ((v >> 1) & 0x55)) * 0x1010101) & 0x30c00c03) * 0x10040041) >> 28;
}

(spelled out in post markdown (viewable via Edit))

It works per byte, so it would have to be called four times for a 32-bit integer. It is derived from the sideways addition, but it uses two 32-bit multiplications to reduce the number of instructions to only seven.

Most current C compilers will optimize this function using SIMD (SSE2) instructions when it is clear that the number of requests is a multiple of 4, and it becomes quite competitive.
It is portable, can be defined as a macro or inline function and does not need data tables.

This approach can be extended to work on 16 bits at a time, using 64-bit multiplications. However, it fails when all 16 bits are set, returning zero, so it can be used only when the 0xFFFF input value is not present.
It is also slower due to the 64-bit operations and does not optimize well.


Turns out that Hacker's Delight (Anderson) includes a solution with the same features, using only five instructions.

static uint32_t hd8(uint8_t v) {
    return ((((v * 0x8040201u) >> 3u) & 0x11111111u) * 0x11111111u) >> 28u;
}

Upvotes: 0

Matt Howells
Matt Howells

Reputation: 41266

This is known as the 'Hamming Weight', 'popcount' or 'sideways addition'.

Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. Instructions like x86's popcnt (on CPUs where it's supported) will almost certainly be fastest for a single integer. Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle (citation needed - hardware popcount is normally fast if it exists at all.).

The 'best' algorithm really depends on which CPU you are on and what your usage pattern is.

Your compiler may know how to do something that's good for the specific CPU you're compiling for, e.g. C++20 std::popcount(), or C++ std::bitset<32>::count(), as a portable way to access builtin / intrinsic functions (see another answer on this question). But your compiler's choice of fallback for target CPUs that don't have hardware popcnt might not be optimal for your use-case. Or your language (e.g. C) might not expose any portable function that could use a CPU-specific popcount when there is one.


Portable algorithms that don't need (or benefit from) any HW support

A pre-populated table lookup method can be very fast if your CPU has a large cache and you are doing lots of these operations in a tight loop. However it can suffer because of the expense of a 'cache miss', where the CPU has to fetch some of the table from main memory. (Look up each byte separately to keep the table small.) If you want popcount for a contiguous range of numbers, only the low byte is changing for groups of 256 numbers, making this very good.

If you know that your bytes will be mostly 0's or mostly 1's then there are efficient algorithms for these scenarios, e.g. clearing the lowest set with a bithack in a loop until it becomes zero.

I believe a very good general purpose algorithm is the following, known as 'parallel' or 'variable-precision SWAR algorithm'. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):

GCC10 and clang 10.0 can recognize this pattern / idiom and compile it to a hardware popcnt or equivalent instruction when available, giving you the best of both worlds. (Godbolt)

int numberOfSetBits(uint32_t i)
{
     // Java: use int, and use >>> instead of >>. Or use Integer.bitCount()
     // C or C++: use uint32_t
     i = i - ((i >> 1) & 0x55555555);        // add pairs of bits
     i = (i & 0x33333333) + ((i >> 2) & 0x33333333);  // quads
     i = (i + (i >> 4)) & 0x0F0F0F0F;        // groups of 8
     i *= 0x01010101;                        // horizontal sum of bytes
     return  i >> 24;               // return just that top byte (after truncating to 32-bit even when int is wider than uint32_t)
}

For JavaScript: coerce to integer with |0 for performance: change the first line to i = (i|0) - ((i >> 1) & 0x55555555);

This has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it. (Its performance is not data-dependent on normal CPUs where all integer operations including multiply are constant-time. It doesn't get any faster with "simple" inputs, but it's still pretty decent.)

References:


How this SWAR bithack works:

i = i - ((i >> 1) & 0x55555555);

The first step is an optimized version of masking to isolate the odd / even bits, shifting to line them up, and adding. This effectively does 16 separate additions in 2-bit accumulators (SWAR = SIMD Within A Register). Like (i & 0x55555555) + ((i>>1) & 0x55555555).

The next step takes the odd/even eight of those 16x 2-bit accumulators and adds again, producing 8x 4-bit sums. The i - ... optimization isn't possible this time so it does just mask before / after shifting. Using the same 0x33... constant both times instead of 0xccc... before shifting is a good thing when compiling for ISAs that need to construct 32-bit constants in registers separately.

The final shift-and-add step of (i + (i >> 4)) & 0x0F0F0F0F widens to 4x 8-bit accumulators. It masks after adding instead of before, because the maximum value in any 4-bit accumulator is 4, if all 4 bits of the corresponding input bits were set. 4+4 = 8 which still fits in 4 bits, so carry between nibble elements is impossible in i + (i >> 4).

So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations. Continuing on with the same pattern for 2 more steps can widen to 2x 16-bit then 1x 32-bit counts. But there is a more efficient way on machines with fast hardware multiply:

Once we have few enough "elements", a multiply with a magic constant can sum all the elements into the top element. In this case byte elements. Multiply is done by left-shifting and adding, so a multiply of x * 0x01010101 results in x + (x<<8) + (x<<16) + (x<<24). Our 8-bit elements are wide enough (and holding small enough counts) that this doesn't produce carry into that top 8 bits.

A 64-bit version of this can do 8x 8-bit elements in a 64-bit integer with a 0x0101010101010101 multiplier, and extract the high byte with >>56. So it doesn't take any extra steps, just wider constants. This is what GCC uses for __builtin_popcountll on x86 systems when the hardware popcnt instruction isn't enabled. If you can use builtins or intrinsics for this, do so to give the compiler a chance to do target-specific optimizations.


With full SIMD for wider vectors (e.g. counting a whole array)

This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x86-64 code that has to run on any CPU, not just Nehalem or later.)

However, the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).

On Intel CPUs, the hardware 64bit popcnt instruction can outperform an SSSE3 PSHUFB bit-parallel implementation by about a factor of 2, but only if your compiler gets it just right. Otherwise SSE can come out significantly ahead. Newer compiler versions are aware of the popcnt false dependency problem on Intel.

Upvotes: 999

Nils Pipenbrinck
Nils Pipenbrinck

Reputation: 86313

Some languages portably expose the operation in a way that can use efficient hardware support if available, otherwise some library fallback that's hopefully decent.

For example (from a table by language):

  • C++ has std::bitset<>::count(), or C++20 std::popcount(T x)
  • Java has java.lang.Integer.bitCount() (also for Long or BigInteger)
  • C# has System.Numerics.BitOperations.PopCount()
  • Python has int.bit_count() (since 3.10)

Not all compilers / libraries actually manage to use HW support when it's available, though. (Notably MSVC, even with options that make std::popcount inline as x86 popcnt, its std::bitset::count still always uses a lookup table. This will hopefully change in future versions.)

Also consider the built-in functions of your compiler when the portable language doesn't have this basic bit operation. In GNU C for example:

int __builtin_popcount (unsigned int x);
int __builtin_popcountll (unsigned long long x);

In the worst case (no single-instruction HW support) the compiler will generate a call to a function (which in current GCC uses a shift/and bit-hack like this answer, at least for x86). In the best case the compiler will emit a cpu instruction to do the job. (Just like a * or / operator - GCC will use a hardware multiply or divide instruction if available, otherwise will call a libgcc helper function.) Or even better, if the operand is a compile-time constant after inlining, it can do constant-propagation to get a compile-time-constant popcount result.

The GCC builtins even work across multiple platforms. Popcount has almost become mainstream in the x86 architecture, so it makes sense to start using the builtin now so you can recompile to let it inline a hardware instruction when you compile with -mpopcnt or something that includes that (example). Other architectures have had popcount for years, but in the x86 world there are still some ancient Core 2 and similar vintage AMD CPUs in use.


On x86, you can tell the compiler that it can assume support for popcnt instruction with -mpopcnt (also implied by -msse4.2). See GCC x86 options. -march=nehalem -mtune=skylake (or -march= whatever CPU you want your code to assume and to tune for) could be a good choice. Running the resulting binary on an older CPU will result in an illegal-instruction fault.

To make binaries optimized for the machine you build them on, use -march=native (with gcc, clang, or ICC).

MSVC provides an intrinsic for the x86 popcnt instruction, but unlike gcc it's really an intrinsic for the hardware instruction and requires hardware support.


Using std::bitset<>::count() instead of a built-in

In theory, any compiler that knows how to popcount efficiently for the target CPU should expose that functionality through ISO C++ std::bitset<>. In practice, you might be better off with the bit-hack AND/shift/ADD in some cases for some target CPUs.

For target architectures where hardware popcount is an optional extension (like x86), not all compilers have a std::bitset that takes advantage of it when available. For example, MSVC has no way to enable popcnt support at compile time, and it's std::bitset<>::count always uses a table lookup, even with /Ox /arch:AVX (which implies SSE4.2, which in turn implies the popcnt feature.) (Update: see below; that does get MSVC's C++20 std::popcount to use x86 popcnt, but still not its bitset<>::count. MSVC could fix that by updating their standard library headers to use std::popcount when available.)

But at least you get something portable that works everywhere, and with gcc/clang with the right target options, you get hardware popcount for architectures that support it.

#include <bitset>
#include <limits>
#include <type_traits>

template<typename T>
//static inline  // static if you want to compile with -mpopcnt in one compilation unit but not others
typename std::enable_if<std::is_integral<T>::value,  unsigned >::type 
popcount(T x)
{
    static_assert(std::numeric_limits<T>::radix == 2, "non-binary type");

    // sizeof(x)*CHAR_BIT
    constexpr int bitwidth = std::numeric_limits<T>::digits + std::numeric_limits<T>::is_signed;
    // std::bitset constructor was only unsigned long before C++11.  Beware if porting to C++03
    static_assert(bitwidth <= std::numeric_limits<unsigned long long>::digits, "arg too wide for std::bitset() constructor");

    typedef typename std::make_unsigned<T>::type UT;        // probably not needed, bitset width chops after sign-extension

    std::bitset<bitwidth> bs( static_cast<UT>(x) );
    return bs.count();
}

See asm from gcc, clang, icc, and MSVC on the Godbolt compiler explorer.

x86-64 gcc -O3 -std=gnu++11 -mpopcnt emits this:

unsigned test_short(short a) { return popcount(a); }
    movzx   eax, di      # note zero-extension, not sign-extension
    popcnt  rax, rax
    ret

unsigned test_int(int a) { return popcount(a); }
    mov     eax, edi
    popcnt  rax, rax        # unnecessary 64-bit operand size
    ret

unsigned test_u64(unsigned long long a) { return popcount(a); }
    xor     eax, eax     # gcc avoids false dependencies for Intel CPUs
    popcnt  rax, rdi
    ret

PowerPC64 gcc -O3 -std=gnu++11 emits (for the int arg version):

    rldicl 3,3,0,32     # zero-extend from 32 to 64-bit
    popcntd 3,3         # popcount
    blr

This source isn't x86-specific or GNU-specific at all, but only compiles well with gcc/clang/icc, at least when targeting x86 (including x86-64).

Also note that gcc's fallback for architectures without single-instruction popcount is a byte-at-a-time table lookup. This isn't wonderful for ARM, for example.

C++20 has std::popcount(T)

Current libstdc++ headers unfortunately define it with a special case if(x==0) return 0; at the start, which clang doesn't optimize away when compiling for x86:

#include <bit>
int bar(unsigned x) {
    return std::popcount(x);
}

clang 11.0.1 -O3 -std=gnu++20 -march=nehalem Live demo

# clang 11
    bar(unsigned int):                                # @bar(unsigned int)
        popcnt  eax, edi
        cmove   eax, edi         # redundant: if popcnt result is 0, return the original 0 instead of the popcnt-generated 0...
        ret

But GCC compiles nicely:

# gcc 10
        xor     eax, eax         # break false dependency on Intel SnB-family before Ice Lake.
        popcnt  eax, edi
        ret

Even MSVC does well with it, as long as you use -arch:AVX or later (and enable C++20 with -std:c++latest). Live demo

int bar(unsigned int) PROC                                 ; bar, COMDAT
        popcnt  eax, ecx
        ret     0
int bar(unsigned int) ENDP                                 ; bar

Upvotes: 256

Martin.Martinsson
Martin.Martinsson

Reputation: 2154

In C# what about an one liner:

BitOperations.PopCount(Mask);

Returns the population count (number of bits set) of a mask. Similar in behavior to the x86 instruction POPCNT. Compatible with x64! It uses an intrinsic (built-in instruction of the X86 architecture) to count the number of bits very fast in a 32 bit or 64 bit value.

NOTE: BitOperations.PopCount() is not CLS compliant. Take this under consideration.

Cheers

Upvotes: 0

limitedeternity
limitedeternity

Reputation: 202

I'll contribute to @Arty's answer

__popcnt16()/__popcnt()/__popcnt64() MSVC's intrinsic (doc here)

popcnt instruction, as noted in "Remarks" section, is available as part of SSE4 instruction set and there is a relatively high chance of it not being available.

If you run code that uses these intrinsics on hardware that doesn't support the popcnt instruction, the results are unpredictable.

So, you need to implement a check as per "Remarks" section:

To determine hardware support for the popcnt instruction, call the __cpuid intrinsic with InfoType=0x00000001 and check bit 23 of CPUInfo[2] (ECX). This bit is 1 if the instruction is supported, and 0 otherwise.

Here's how you do it:

unsigned popcnt(const unsigned input)
{
    struct cpuinfo_t
    {
        union
        {
            int regs[4];

            struct
            {
                long eax, ebx, ecx, edx;
            };
        };

        cpuinfo_t() noexcept : regs() {}
    }
    cpuinfo;

    // EAX=1: Processor Info and Feature Bits
    __cpuid(cpuinfo.regs, 1);

    // ECX bit 23: popcnt
    if (_bittest(&cpuinfo.ecx, 23))
    {
        return __popcnt(input);
    }

    // Choose any fallback implementation you like, there's already a ton of them
    unsigned num = input;
    num = (num & 0x55555555) + (num >> 1 & 0x55555555);
    num = (num & 0x33333333) + (num >> 2 & 0x33333333);
    num = (num & 0x0F0F0F0F) + (num >> 4 & 0x0F0F0F0F);
    num = (num & 0x00FF00FF) + (num >> 8 & 0x00FF00FF);
    num = (num & 0x0000FFFF) + (num >> 16 & 0x0000FFFF);
    return num;
}

Upvotes: 0

vidit
vidit

Reputation: 6451

I think the fastest way—without using lookup tables and popcount—is the following. It counts the set bits with just 12 operations.

int popcount(int v) {
    v = v - ((v >> 1) & 0x55555555);                // put count of each 2 bits into those 2 bits
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // put count of each 4 bits into those 4 bits  
    return ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}

It works because you can count the total number of set bits by dividing in two halves, counting the number of set bits in both halves and then adding them up. Also know as Divide and Conquer paradigm. Let's get into detail..

v = v - ((v >> 1) & 0x55555555); 

The number of bits in two bits can be 0b00, 0b01 or 0b10. Lets try to work this out on 2 bits..

 ---------------------------------------------
 |   v    |   (v >> 1) & 0b0101   |  v - x   |
 ---------------------------------------------
   0b00           0b00               0b00   
   0b01           0b00               0b01     
   0b10           0b01               0b01
   0b11           0b01               0b10

This is what was required: the last column shows the count of set bits in every two bit pair. If the two bit number is >= 2 (0b10) then and produces 0b01, else it produces 0b00.

v = (v & 0x33333333) + ((v >> 2) & 0x33333333); 

This statement should be easy to understand. After the first operation we have the count of set bits in every two bits, now we sum up that count in every 4 bits.

v & 0b00110011         //masks out even two bits
(v >> 2) & 0b00110011  // masks out odd two bits

We then sum up the above result, giving us the total count of set bits in 4 bits. The last statement is the most tricky.

c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;

Let's break it down further...

v + (v >> 4)

It's similar to the second statement; we are counting the set bits in groups of 4 instead. We know—because of our previous operations—that every nibble has the count of set bits in it. Let's look an example. Suppose we have the byte 0b01000010. It means the first nibble has its 4bits set and the second one has its 2bits set. Now we add those nibbles together.

v = 0b01000010
(v >> 4) = 0b00000100
v + (v >> 4) = 0b01000010 + 0b00000100

It gives us the count of set bits in a byte, in the second nibble 0b01000110 and therefore we mask the first four bytes of all the bytes in the number (discarding them).

0b01000110 & 0x0F = 0b00000110

Now every byte has the count of set bits in it. We need to add them up all together. The trick is to multiply the result by 0b10101010 which has an interesting property. If our number has four bytes, A B C D, it will result in a new number with these bytes A+B+C+D B+C+D C+D D. A 4 byte number can have maximum of 32 bits set, which can be represented as 0b00100000.

All we need now is the first byte which has the sum of all set bits in all the bytes, and we get it by >> 24. This algorithm was designed for 32 bit words but can be easily modified for 64 bit words.

Upvotes: 87

paxdiablo
paxdiablo

Reputation: 881083

In my opinion, the "best" solution is the one that can be read by another programmer (or the original programmer two years later) without copious comments. You may well want the fastest or cleverest solution which some have already provided but I prefer readability over cleverness any time.

unsigned int bitCount (unsigned int value) {
    unsigned int count = 0;
    while (value > 0) {           // until all bits are zero
        if ((value & 1) == 1)     // check lower bit
            count++;
        value >>= 1;              // shift bits, removing lower bit
    }
    return count;
}

If you want more speed (and assuming you document it well to help out your successors), you could use a table lookup:

// Lookup table for fast calculation of bits set in 8-bit unsigned char.

static unsigned char oneBitsInUChar[] = {
//  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F (<- n)
//  =====================================================
    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, // 0n
    1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, // 1n
    : : :
    4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8, // Fn
};

// Function for fast calculation of bits set in 16-bit unsigned short.

unsigned char oneBitsInUShort (unsigned short x) {
    return oneBitsInUChar [x >>    8]
         + oneBitsInUChar [x &  0xff];
}

// Function for fast calculation of bits set in 32-bit unsigned int.

unsigned char oneBitsInUInt (unsigned int x) {
    return oneBitsInUShort (x >>     16)
         + oneBitsInUShort (x &  0xffff);
}

These rely on specific data type sizes so they're not that portable. But, since many performance optimisations aren't portable anyway, that may not be an issue. If you want portability, I'd stick to the readable solution.

Upvotes: 213

Amisha Sahu
Amisha Sahu

Reputation: 89

Naive Solution

Time Complexity is O(no. of bits in n)

int countSet(unsigned int n)
{
    int res=0;
    while(n!=0){
      res += (n&1);
      n >>= 1;      // logical right shift, like C unsigned or Java >>>
    }
   return res;
}

Brian Kerningam's algorithm

Time Complexity is O(no of set bits in n)

int countSet(unsigned int n)
{
  int res=0;
  while(n != 0)
  {
    n = (n & (n-1));
    res++;
  }
  return res;
} 

Lookup table method for 32-bit number- In this method we break the 32-bit number into chunks of four, 8-bit numbers

Time Complexity is O(1)

static unsigned char table[256]; /* the table size is 256,
                        the number of values i&0xFF (8 bits) can have */

void initialize() //holds the number of set bits from 0 to 255
{
  table[0]=0;
  for(unsigned int i=1;i<256;i++)
     table[i]=(i&1)+table[i>>1];
}

int countSet(unsigned int n)
{
  // 0xff is hexadecimal representation of 8 set bits.
  int res=table[n & 0xff];
  n=n>>8;
  res=res+ table[n & 0xff];
  n=n>>8;
  res=res+ table[n & 0xff];
  n=n>>8;
  res=res+ table[n & 0xff];
  return res;
}

Upvotes: 4

abelenky
abelenky

Reputation: 64672

int countBits(int x)
{
    int n = 0;
    if (x) do n++;
           while(x=x&(x-1));
    return n;
}   

Or also:

int countBits(int x) { return (x)? 1+countBits(x&(x-1)): 0; }

7 1/2 years after my original answer, @PeterMortensen questioned if this was even valid C syntax. I posted a link to an online compiler showing that it is in fact perfectly valid syntax (code below).

#include <stdio.h>
int countBits(int x)
{
    int n = 0;
    if (x) do n++;           /* Totally Normal Valid code. */
           while(x=x&(x-1)); /* Nothing to see here.       */
    return n;
}   
 
int main(void) {
    printf("%d\n", countBits(25));
    return 0;
}
 

Output:

3

If you want to re-write it for clarity, it would look like:

if (x)
{
    do
    {
        n++;
    } while(x=x&(x-1));
}

But that seems excessive to my eye.

However, I've also realized the function can be made shorter, but perhaps more cryptic, written as:

int countBits(int x)
{
    int n = 0;
    while (x) x=(n++,x&(x-1));
    return n;
}   

Upvotes: 2

Arty
Arty

Reputation: 16737

I am providing one more unmentioned algorithm, called Parallel, taken from here. The nice point about it that it is generic, meaning that the code is the same for bit sizes 8, 16, 32, 64, and 128.

I checked the correctness of its values and timings on an amount of 2^26 numbers for bits sizes 8, 16, 32, and 64. See the timings below.

This algorithm is a first code snippet. The other two are mentioned here just for reference, because I tested and compared to them.

Algorithms are coded in C++, to be generic, but it can be easily adopted to old C.

#include <type_traits>
#include <cstdint>

template <typename IntT>
inline size_t PopCntParallel(IntT n) {
    // https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
    using T = std::make_unsigned_t<IntT>;

    T v = T(n);
    v = v - ((v >> 1) & (T)~(T)0/3);                           // temp
    v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3);      // temp
    v = (v + (v >> 4)) & (T)~(T)0/255*15;                      // temp
    return size_t((T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * 8); // count
}

Below are two algorithms that I compared with. One is the Kernighan simple method with a loop, taken from here.

template <typename IntT>
inline size_t PopCntKernighan(IntT n) {
    // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
    using T = std::make_unsigned_t<IntT>;
    T v = T(n);
    size_t c;
    for (c = 0; v; ++c)
        v &= v - 1; // Clear the least significant bit set
    return c;
}

Another one is using built-in __popcnt16()/__popcnt()/__popcnt64() MSVC's intrinsic (doc here). Or __builtin_popcount of CLang/GCC (doc here). This intrinsic should provide a very optimized version, possibly hardware:

#ifdef _MSC_VER
    // https://learn.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=msvc-160
    #include <intrin.h>
    #define popcnt16 __popcnt16
    #define popcnt32 __popcnt
    #define popcnt64 __popcnt64
#else
    // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
    #define popcnt16 __builtin_popcount
    #define popcnt32 __builtin_popcount
    #define popcnt64 __builtin_popcountll
#endif

template <typename IntT>
inline size_t PopCntBuiltin(IntT n) {
    using T = std::make_unsigned_t<IntT>;
    T v = T(n);
    if constexpr(sizeof(IntT) <= 2)
        return popcnt16(uint16_t(v));
    else if constexpr(sizeof(IntT) <= 4)
        return popcnt32(uint32_t(v));
    else if constexpr(sizeof(IntT) <= 8)
        return popcnt64(uint64_t(v));
    else
        static_assert([]{ return false; }());
}

Below are the timings, in nanoseconds per one number. All timings are done for 2^26 random numbers. Timings are compared for all three algorithms and all bit sizes among 8, 16, 32, and 64. In sum, all tests took 16 seconds on my machine. The high-resolution clock was used.

08 bit Builtin 8.2 ns
08 bit Parallel 8.2 ns
08 bit Kernighan 26.7 ns

16 bit Builtin 7.7 ns
16 bit Parallel 7.7 ns
16 bit Kernighan 39.7 ns

32 bit Builtin 7.0 ns
32 bit Parallel 7.0 ns
32 bit Kernighan 47.9 ns

64 bit Builtin 7.5 ns
64 bit Parallel 7.5 ns
64 bit Kernighan 59.4 ns

128 bit Builtin 7.8 ns
128 bit Parallel 13.8 ns
128 bit Kernighan 127.6 ns

As one can see, the provided Parallel algorithm (first among three) is as good as MSVC's/CLang's intrinsic.


For reference, below is full code that I used to test speed/time/correctness of all functions.

As a bonus this code (unlike short code snippets above) also tests 128 bit size, but only under CLang/GCC (not MSVC), as they have unsigned __int128.

Try it online!

#include <type_traits>
#include <cstdint>

using std::size_t;

#if defined(_MSC_VER) && !defined(__clang__)
    #define IS_MSVC 1
#else
    #define IS_MSVC 0
#endif

#if IS_MSVC
    #define HAS128 false
#else
    using int128_t = __int128;
    using uint128_t = unsigned __int128;
    #define HAS128 true
#endif

template <typename T> struct UnSignedT { using type = std::make_unsigned_t<T>; };
#if HAS128
    template <> struct UnSignedT<int128_t> { using type = uint128_t; };
    template <> struct UnSignedT<uint128_t> { using type = uint128_t; };
#endif
template <typename T> using UnSigned = typename UnSignedT<T>::type;

template <typename IntT>
inline size_t PopCntParallel(IntT n) {
    // https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
    using T = UnSigned<IntT>;

    T v = T(n);
    v = v - ((v >> 1) & (T)~(T)0/3);                           // temp
    v = (v & (T)~(T)0/15*3) + ((v >> 2) & (T)~(T)0/15*3);      // temp
    v = (v + (v >> 4)) & (T)~(T)0/255*15;                      // temp
    return size_t((T)(v * ((T)~(T)0/255)) >> (sizeof(T) - 1) * 8); // count
}

template <typename IntT>
inline size_t PopCntKernighan(IntT n) {
    // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
    using T = UnSigned<IntT>;
    T v = T(n);
    size_t c;
    for (c = 0; v; ++c)
        v &= v - 1; // Clear the least significant bit set
    return c;
}

#if IS_MSVC
    // https://learn.microsoft.com/en-us/cpp/intrinsics/popcnt16-popcnt-popcnt64?view=msvc-160
    #include <intrin.h>
    #define popcnt16 __popcnt16
    #define popcnt32 __popcnt
    #define popcnt64 __popcnt64
#else
    // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
    #define popcnt16 __builtin_popcount
    #define popcnt32 __builtin_popcount
    #define popcnt64 __builtin_popcountll
#endif

#define popcnt128(x) (popcnt64(uint64_t(x)) + popcnt64(uint64_t(x >> 64)))

template <typename IntT>
inline size_t PopCntBuiltin(IntT n) {
    using T = UnSigned<IntT>;
    T v = T(n);
    if constexpr(sizeof(IntT) <= 2)
        return popcnt16(uint16_t(v));
    else if constexpr(sizeof(IntT) <= 4)
        return popcnt32(uint32_t(v));
    else if constexpr(sizeof(IntT) <= 8)
        return popcnt64(uint64_t(v));
    else if constexpr(sizeof(IntT) <= 16)
        return popcnt128(uint128_t(v));
    else
        static_assert([]{ return false; }());
}

#include <random>
#include <vector>
#include <chrono>
#include <string>
#include <iostream>
#include <iomanip>
#include <map>

inline double Time() {
    static auto const gtb = std::chrono::high_resolution_clock::now();
    return std::chrono::duration_cast<std::chrono::duration<double>>(
        std::chrono::high_resolution_clock::now() - gtb).count();
}

template <typename T, typename F>
void Test(std::string const & name, F f) {
    std::mt19937_64 rng{123};
    size_t constexpr bit_size = sizeof(T) * 8, ntests = 1 << 6, nnums = 1 << 14;
    std::vector<T> nums(nnums);
    for (size_t i = 0; i < nnums; ++i)
        nums[i] = T(rng() % ~T(0));
    static std::map<size_t, size_t> times;
    double min_time = 1000;
    for (size_t i = 0; i < ntests; ++i) {
        double timer = Time();
        size_t sum = 0;
        for (size_t j = 0; j < nnums; j += 4)
            sum += f(nums[j + 0]) + f(nums[j + 1]) + f(nums[j + 2]) + f(nums[j + 3]);
        auto volatile vsum = sum;
        min_time = std::min(min_time, (Time() - timer) / nnums);
        if (times.count(bit_size) && times.at(bit_size) != sum)
            std::cout << "Wrong bit cnt checksum!" << std::endl;
        times[bit_size] = sum;
    }
    std::cout << std::setw(2) << std::setfill('0') << bit_size
        << " bit " << name << " " << std::fixed << std::setprecision(1)
        << min_time * 1000000000 << " ns" << std::endl;
}

int main() {
    #define TEST(T) \
        Test<T>("Builtin", PopCntBuiltin<T>); \
        Test<T>("Parallel", PopCntParallel<T>); \
        Test<T>("Kernighan", PopCntKernighan<T>); \
        std::cout << std::endl;
    
    TEST(uint8_t); TEST(uint16_t); TEST(uint32_t); TEST(uint64_t);
    #if HAS128
        TEST(uint128_t);
    #endif
    
    #undef TEST
}

Upvotes: 1

Jerry An
Jerry An

Reputation: 1432

Python solution:

def hammingWeight(n: int) -> int:
    sums = 0
    while (n!=0):
        sums+=1
        n = n &(n-1)

    return sums

In the binary representation, the least significant 1-bit in n always corresponds to a 0-bit in n - 1. Therefore, anding the two numbers n and n - 1 always flips the least significant 1-bit in n to 0, and keeps all other bits the same.

Enter image description here

Upvotes: 3

Varun Gusain
Varun Gusain

Reputation: 1

You can do:

while(n){
    n = n & (n-1);
    count++;
}

The logic behind this is the bits of n-1 is inverted from rightmost set bit of n.

If n=6, i.e., 110 then 5 is 101 the bits are inverted from rightmost set bit of n.

So if we & these two we will make the rightmost bit 0 in every iteration and always go to the next rightmost set bit. Hence, counting the set bit. The worst time complexity will be O(log n) when every bit is set.

Upvotes: 7

Jfm Meyers
Jfm Meyers

Reputation: 133

Kotlin pre 1.4

        fun NumberOfSetBits(i: Int): Int {
            var i = i
            i -= (i ushr 1 and 0x55555555)
            i = (i and 0x33333333) + (i ushr 2 and 0x33333333)
            return (i + (i ushr 4) and 0x0F0F0F0F) * 0x01010101 ushr 24
        }

This is more or less a copy of the answer seen in the top answer.

It is with the Java fixes and is then converted using the converter in the IntelliJ IDEA Community Edition

1.4 and beyond (as of 2021-05-05 - it could change in the future).

        fun NumberOfSetBits(i: Int): Int {
            return i.countOneBits()
        }

Under the hood it uses Integer.bitCount as seen here:

@SinceKotlin("1.4")
@WasExperimental(ExperimentalStdlibApi::class)
@kotlin.internal.InlineOnly
public actual inline fun Int.countOneBits(): Int = Integer.bitCount(this)

Upvotes: 0

monoceres
monoceres

Reputation: 4770

Here is the functional master race recursive solution, and it is by far the purest one (and can be used with any bit length!):

template<typename T>
int popcnt(T n)
{
  if (n>0)
    return n&1 + popcnt(n>>1);
  return 0; 
}

Upvotes: 0

Arjun Singh
Arjun Singh

Reputation: 5

A simple algorithm to count the number of set bits:

int countbits(n) {
    int count = 0;
    while(n != 0) {
        n = n & (n-1);
        count++;
    }
    return count;
}

Take the example of 11 (1011) and try manually running through the algorithm. It should help you a lot!

Upvotes: 0

stacktay
stacktay

Reputation: 81

private int get_bits_set(int v)
{
    int c; // 'c' accumulates the total bits set in 'v'
    for (c = 0; v>0; c++)
    {
        v &= v - 1; // Clear the least significant bit set
    }
    return c;
}

Upvotes: 11

decrypto
decrypto

Reputation: 1

This will also work fine:

int ans = 0;
while(num) {
  ans += (num & 1);
  num = num >> 1;
}    
return ans;

Upvotes: -4

Anders Cedronius
Anders Cedronius

Reputation: 2076

Another Hamming weight algorithm if you're on a BMI2 capable CPU:

the_weight = __tzcnt_u64(~_pext_u64(data[i], data[i]));

Upvotes: 1

Nikhil Katre
Nikhil Katre

Reputation: 2234

I am giving two algorithms to answer the question,

package countSetBitsInAnInteger;

import java.util.Scanner;

public class UsingLoop {

    public static void main(String[] args) {
        Scanner in = new Scanner(System.in);
        try {
            System.out.println("Enter a integer number to check for set bits in it");
            int n = in.nextInt();
            System.out.println("Using while loop, we get the number of set bits as: " + usingLoop(n));
            System.out.println("Using Brain Kernighan's Algorithm, we get the number of set bits as: " + usingBrainKernighan(n));
            System.out.println("Using ");
        }
        finally {
            in.close();
        }
    }

    private static int usingBrainKernighan(int n) {
        int count = 0;
        while(n > 0) {
            n& = (n-1);
            count++;
        }
        return count;
    }

    /*
        Analysis:
            Time complexity = O(lgn)
            Space complexity = O(1)
    */

    private static int usingLoop(int n) {
        int count = 0;
        for(int i=0; i<32; i++) {
            if((n&(1 << i)) != 0)
                count++;
        }
        return count;
    }

    /*
        Analysis:
            Time Complexity = O(32) // Maybe the complexity is O(lgn)
            Space Complexity = O(1)
    */
}

Upvotes: -1

iamdeit
iamdeit

Reputation: 6035

I always use this in competitive programming, and it's easy to write and is efficient:

#include <bits/stdc++.h>

using namespace std;

int countOnes(int n) {
    bitset<32> b(n);
    return b.count();
}

Upvotes: 6

KeineKaefer
KeineKaefer

Reputation: 1

Convert the integer to a binary string and count the ones.

PHP solution:

substr_count(decbin($integer), '1');

Upvotes: 0

dadhi
dadhi

Reputation: 4950

A fast C# solution using a pre-calculated table of Byte bit counts with branching on the input size.

public static class BitCount
{
    public static uint GetSetBitsCount(uint n)
    {
        var counts = BYTE_BIT_COUNTS;
        return n <= 0xff ? counts[n]
             : n <= 0xffff ? counts[n & 0xff] + counts[n >> 8]
             : n <= 0xffffff ? counts[n & 0xff] + counts[(n >> 8) & 0xff] + counts[(n >> 16) & 0xff]
             : counts[n & 0xff] + counts[(n >> 8) & 0xff] + counts[(n >> 16) & 0xff] + counts[(n >> 24) & 0xff];
    }

    public static readonly uint[] BYTE_BIT_COUNTS =
    {
        0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
        1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
        1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
        1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
        2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
        3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
        3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
        4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
    };
}

Upvotes: 6

prongs
prongs

Reputation: 9606

I use the following function. I haven't checked benchmarks, but it works.

int msb(int num)
{
    int m = 0;
    for (int i = 16; i > 0; i = i>>1)
    {
        // debug(i, num, m);
        if(num>>i)
        {
            m += i;
            num>>=i;
        }
    }
    return m;
}

Upvotes: -2

Andry
Andry

Reputation: 2727

For those who want it in C++11 for any unsigned integer type as a consexpr function (tacklelib/include/tacklelib/utility/math.hpp):

#include <stdint.h>
#include <limits>
#include <type_traits>

const constexpr uint32_t uint32_max = (std::numeric_limits<uint32_t>::max)();

namespace detail
{
    template <typename T>
    inline constexpr T _count_bits_0(const T & v)
    {
        return v - ((v >> 1) & 0x55555555);
    }

    template <typename T>
    inline constexpr T _count_bits_1(const T & v)
    {
        return (v & 0x33333333) + ((v >> 2) & 0x33333333);
    }

    template <typename T>
    inline constexpr T _count_bits_2(const T & v)
    {
        return (v + (v >> 4)) & 0x0F0F0F0F;
    }

    template <typename T>
    inline constexpr T _count_bits_3(const T & v)
    {
        return v + (v >> 8);
    }

    template <typename T>
    inline constexpr T _count_bits_4(const T & v)
    {
        return v + (v >> 16);
    }

    template <typename T>
    inline constexpr T _count_bits_5(const T & v)
    {
        return v & 0x0000003F;
    }

    template <typename T, bool greater_than_uint32>
    struct _impl
    {
        static inline constexpr T _count_bits_with_shift(const T & v)
        {
            return
                detail::_count_bits_5(
                    detail::_count_bits_4(
                        detail::_count_bits_3(
                            detail::_count_bits_2(
                                detail::_count_bits_1(
                                    detail::_count_bits_0(v)))))) + count_bits(v >> 32);
        }
    };

    template <typename T>
    struct _impl<T, false>
    {
        static inline constexpr T _count_bits_with_shift(const T & v)
        {
            return 0;
        }
    };
}

template <typename T>
inline constexpr T count_bits(const T & v)
{
    static_assert(std::is_integral<T>::value, "type T must be an integer");
    static_assert(!std::is_signed<T>::value, "type T must be not signed");

    return uint32_max >= v ?
        detail::_count_bits_5(
            detail::_count_bits_4(
                detail::_count_bits_3(
                    detail::_count_bits_2(
                        detail::_count_bits_1(
                            detail::_count_bits_0(v)))))) :
        detail::_impl<T, sizeof(uint32_t) < sizeof(v)>::_count_bits_with_shift(v);
}

Plus tests in google test library:

#include <stdlib.h>
#include <time.h>

namespace {
    template <typename T>
    inline uint32_t _test_count_bits(const T & v)
    {
        uint32_t count = 0;
        T n = v;
        while (n > 0) {
            if (n % 2) {
                count += 1;
            }
            n /= 2;
        }
        return count;
    }
}

TEST(FunctionsTest, random_count_bits_uint32_100K)
{
    srand(uint_t(time(NULL)));
    for (uint32_t i = 0; i < 100000; i++) {
        const uint32_t r = uint32_t(rand()) + (uint32_t(rand()) << 16);
        ASSERT_EQ(_test_count_bits(r), count_bits(r));
    }
}

TEST(FunctionsTest, random_count_bits_uint64_100K)
{
    srand(uint_t(time(NULL)));
    for (uint32_t i = 0; i < 100000; i++) {
        const uint64_t r = uint64_t(rand()) + (uint64_t(rand()) << 16) + (uint64_t(rand()) << 32) + (uint64_t(rand()) << 48);
        ASSERT_EQ(_test_count_bits(r), count_bits(r));
    }
}

Upvotes: 0

Lance Pollard
Lance Pollard

Reputation: 79168

For JavaScript, you can count the number of set bits on a 32-bit value using a lookup table (and this code can be easily translated to C). In addition, added 8-bit and 16-bit versions for completeness for people who find this through web search.

const COUNT_BITS_TABLE = makeLookupTable()

function makeLookupTable() {
  const table = new Uint8Array(256)
  for (let i = 0; i < 256; i++) {
    table[i] = (i & 1) + table[(i / 2) | 0];
  }
  return table
}

function countOneBits32(n) {
  return COUNT_BITS_TABLE[n & 0xff] +
    COUNT_BITS_TABLE[(n >> 8) & 0xff] +
    COUNT_BITS_TABLE[(n >> 16) & 0xff] +
    COUNT_BITS_TABLE[(n >> 24) & 0xff];
}

function countOneBits16(n) {
  return COUNT_BITS_TABLE[n & 0xff] +
    COUNT_BITS_TABLE[(n >> 8) & 0xff]
}

function countOneBits8(n) {
  return COUNT_BITS_TABLE[n & 0xff]
}

console.log('countOneBits32', countOneBits32(0b10101010000000001010101000000000))
console.log('countOneBits32', countOneBits32(0b10101011110000001010101000000000))
console.log('countOneBits16', countOneBits16(0b1010101000000000))
console.log('countOneBits8', countOneBits8(0b10000010))

Upvotes: -1

Steven Chou
Steven Chou

Reputation: 2215

For Java, there is a java.util.BitSet. https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html

cardinality(): Returns the number of bits set to true in this BitSet.

The BitSet is memory efficient since it's stored as a Long.

Upvotes: 0

Boštjan Mejak
Boštjan Mejak

Reputation: 977

From Python 3.10 onwards, you will be able to use the int.bit_count() function, but for the time being, you can define this function yourself.

def bit_count(integer):
    return bin(integer).count("1")

Upvotes: 1

Related Questions