Gelliant
Gelliant

Reputation: 1845

Find consecutive ones and zeros

I am looking for the fastest way to convert a stream of integers into a list that counts consecutive ones and zeros.

For example the integers [4294967295,4194303,3758096384]

are at bit level:

11111111111111111111111111111111
11111111111111111111110000000000
00000000000000000000000000000111

(each string of bits is in little-endian order)

So the program should output three values: [54 39 3] There are 54 ones, followed by 39 zeros, and finally 3 ones.

I have been looking into these algorithms: http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightLinear

Probably I need to write something along these lines

i=(the first bit of the first integer)
repeat till the end
    find the number of consecutive i's in this integer
    if we reach the end of the integer, continue with the next
    else i = (not)i

But I was wondering if someone can think of a better way to do it.

At the moment the function is build in Matlab like this:

%get all bits in a long vector
data = uint32([4294967295,4194303,3758096384]);
logi = false([1,length(data)*32]);
for ct = 1:length(data)
    logi(1+32*(ct-1):ct*32)=bitget(data(1+(ct-1)),1:32);
end
%count consecutive 1s and 0s
Lct=1;
L=1;i = logi(1);
for ct = 2:length(logi)
    if logi(ct)==i
        L(Lct)=L(Lct)+1;
    else
        i=logi(ct);
        Lct=Lct+1;
        L(Lct)=1;
    end
end

>> L = 54    39     3

Note: It took me some time to make the problem clear. Hence the comments about language and the exact nature of the problem. Hopefully (after many edits) this question is now in a form where it can be found and the answer can be useful to others as well.

Upvotes: 3

Views: 1607

Answers (3)

Luis Colorado
Luis Colorado

Reputation: 12668

First of all, to say that your sample numbers are wrong, as the second has the most significant bit at one, it should be larger than 2147483643, but it is only 4194303, and the third one should be 7, so I guess you have inverted the bit positions when converting them to decimal. See my last complete code for a comment at the beginning of main(), on how the numbers have been determined (to look like in your example) The numbers corresponding to your bit pattern are (hex/dec):

[0xffffffff/4294967295][0xfffffc00/4294966272][0x00000007/7]

(if we put the more weight digits at the left, why don't we do also in binary?)

To solve your problem, you can consider that when you have n consecutive ones in the LSB part of a number, and you increment that value by one, then you have all those consecutive ones switched to zeros (by means of carry propagation) until the next o the last one you have, and if you have n consecutive zeros and decrement the value, then you have all these zeros convert into ones... well, with one more bit, as carry cascades again one bit further. The idea is to check what bit do we have in the LSB, and depending on this, increment or decrement the value and XOR it with the original value.... the result you will get is a number that has as many ones in the LSBs as equal bits to the LSB, plus one, e.g:

 1100100011111111

as the LSB is 1, we increment it:

 1100100100000000
        ^^^^^^^^^ changed bits.

if we now xor this value with the previous:

 0000000111111111  => 9 "1" bits, that indicate that 8 "1" consecutive bits were present

if we prepare a switch statement with all the possible values we can get from this function, you can get a very effective way to the following result:

 int get_consecutive_bits(unsigned value)
 {
     unsigned next = value;
     switch (value) {
     case 0: case ~0: return 32; /* these are special cases, see below */
     }
     switch (value & 1) { /* get the lower bit */
     case 0: next--; break; /* decrement */
     case 1: next++; break; /* increment */
     }
     switch (value ^ next) { /* make the xor */
     case 0x00000003: return 1;
     case 0x00000007: return 2;
     case 0x0000000f: return 3;
     case 0x0000001f: return 4;
     case 0x0000003f: return 5;
     case 0x0000007f: return 6;
     /* ... */
     case 0xffffffff: return 31;
     } /* switch */
 }

Now you have to accumulate that value in case the next array cell begins with the same bit value as you finished the previous. The reason we never have a case statement of 0x00000001 is that we are forcing a carry in the second bit, so we always have a value of 1 or more, with two bits changed (...0000001 => ...0000010 => ...0000011 and ...11111110 => ...11111101 => ...00000011) and this also means that for values 0000...0000 and 1111...1111 we should get one bit more than the word length, making these values special (as they make the carry go to the next bit to the msb, the 33rd) so we check for those values first.

This is a very efficient way to do the task in chunks of one array cell. You have to accumulate, when the value you get includes the MSB, as the next word can start with that same bit you ended before.

The next code should illustrate the algorithm:

pru_49297910.c

/* pru_49297910.c -- answer to https://stackoverflow.com/questions/49297910/
 * Author: Luis Colorado <[email protected]>
 * Date: Wed Apr 24 11:12:21 EEST 2019
 * Copyright: (C) Luis Colorado.  All rights reserved.
 * License: BSD.  Open source.
 */

#include <cassert>
#include <iostream>

#define BITS_PER_ELEMENT    32

int get_consecutive_bits(unsigned value)
{
    switch (value) {
    case 0: case ~0: /* these are special cases, see below */
            return BITS_PER_ELEMENT;
    }
    unsigned next = value;
    switch (value & 1) { /* get the lower bit */
    case 0: next--; break; /* decrement */
    case 1: next++; break; /* increment */
    }
    switch (value ^ next) { /* make the xor */
    case 0x00000003: return 1;      case 0x00000007: return 2;
    case 0x0000000f: return 3;      case 0x0000001f: return 4;
    case 0x0000003f: return 5;      case 0x0000007f: return 6;
    case 0x000000ff: return 7;      case 0x000001ff: return 8;
    case 0x000003ff: return 9;      case 0x000007ff: return 10;
    case 0x00000fff: return 11;     case 0x00001fff: return 12;
    case 0x00003fff: return 13;     case 0x00007fff: return 14;
    case 0x0000ffff: return 15;     case 0x0001ffff: return 16;
    case 0x0003ffff: return 17;     case 0x0007ffff: return 18;
    case 0x000fffff: return 19; case 0x001fffff: return 20;
    case 0x003fffff: return 21; case 0x007fffff: return 22;
    case 0x00ffffff: return 23; case 0x01ffffff: return 24;
    case 0x03ffffff: return 25; case 0x07ffffff: return 26;
    case 0x0fffffff: return 27; case 0x1fffffff: return 28;
    case 0x3fffffff: return 29; case 0x7fffffff: return 30;
    case 0xffffffff: return 31;
    } /* switch */
    assert(!"Impossible");
    return 0;
}

#define FLUSH() do{                         \
            runlen(accum, state);   \
        state ^= 1;                         \
        accum = 0;                          \
    } while (0)

void run_runlen_encoding(unsigned array[], int n, void (*runlen)(int, unsigned))
{
    int state = 0; /* always begin in 0 */
    int accum = 0; /* accumulated bits */
    while (n--) {
        /* see if we have to change */
        if (state ^ (array[n] & 1)) /* we changed state */
                    FLUSH();
            int nb = BITS_PER_ELEMENT; /* number of bits to check */
            int w = array[n];
        while (nb > 0) {
                    int b = get_consecutive_bits(w);
                    if (b < nb) {
                            accum += b;
                            FLUSH();
                            w >>= b;
                            nb -= b;
                    } else {  /* b >= nb, we only accumulate nb */
                accum += nb;
                            nb = 0;
                    }
            }
    }
    if (accum)
            FLUSH();
} /* run_runlen_encoding */

void output_runlen(int n, unsigned kind)
{
    if (n) { /* don't print for n == 0 */
            static int i = 0;
            std::cout << "[" << n << "/" << kind << "]";
            if (!(++i % 10))
                    std::cout << std::endl;
    }
} /* output_runlen */

int main()
{
     /* 0b1111_1111_1111_1111_1111_1111_1111_1111, 0b1111_1111_1111_1111_1111_1100_0000_0000, 0b0000_0000_0000_0000_0000_0000_0000_0111 */
     /*    0xf____f____f____f____f____f____f____f,    0xf____f____f____f____f____c____0____0,    0x0____0____0____0____0____0____0____7 */
     /*                                0xffffffff,                                0xfffffc00,                                0x00000007 */
    unsigned int array[] =
#if 1
        { 0xffffffff, 0xfffffc00, 0x00000007 }; /* correct values for your example */
#else
            { 4294967295, 4194303, 3758096384 }; /* original values, only first matches. */
#endif
    size_t array_n = sizeof array / sizeof array[0];

    run_runlen_encoding(array, array_n, output_runlen);
    std::cout << std::endl;
} /* main */

Note:

As we needed to compute how far the carry bit jumps in one increment, we have to go from the less significant bit to the most, making the output just the reverse order than you tried, but I'm sure you will be able to change the order to make it appear as you stated in the question.

The program output shows:

$ pru_49297910
[3/1][39/0][54/1]

Upvotes: 1

Paweł Iwaneczko
Paweł Iwaneczko

Reputation: 929

Earlier I had misunderstood the question. Now i know what You were asking. This should work, I've tested it:

#include <iostream>
#include <deque>

using namespace std;

//old version for whole collection
void ConsecutiveOnesAndZeros(deque<uint32_t> values, deque<uint8_t> &outCount)
{
    int i;
    if (!values.empty()) {
        uint8_t count = 0, lastBit = (values[0] & 1);
        for (uint32_t &value : values)
        {
            for (i = 0; (i < 32) && (value != 0); i++)
            {
                if (lastBit != uint8_t((value >> i) & 1))
                {
                    outCount.push_back(count);
                    count = 0;
                    lastBit = !lastBit;
                }
                count++;
            }
            if (i < 32) count += (32 - i);
        }
        outCount.push_back(count);
    }
}

//stream version for receiving integer
void ConsecutiveOnesAndZeros(uint32_t value, uint8_t &count, uint8_t &lastBit, deque<uint8_t> &outCount)
{
    int i;
    for (i = 0; (i < 32) && (value != 0); i++)
    {
        if (lastBit != uint8_t((value >> i) & 1))
        {
            if(count) outCount.push_back(count);
            count = 0;
            lastBit = !lastBit;
        }
        count++;
    }
    if (i < 32) count += (32 - i);
}

int main()
{
    deque<uint8_t> outCount;
    deque<uint32_t> stream = { 4294967295u,4194303u,3758096384u };

    ConsecutiveOnesAndZeros(stream, outCount);
    for (auto res : outCount) {
        printf_s("%d,", res);
    }
    printf_s("\n");

    uint8_t count = 0, bit = 0;
    outCount.clear();
    for (auto val : stream) 
        ConsecutiveOnesAndZeros(val, count, bit, outCount);
    if (count) outCount.push_back(count);

    for (auto res : outCount) {
        printf_s("%d,", res);
    }
    printf_s("\n");

    system("pause");
}

UPDATE - I've made a little optimisation of checking value != 0. I've also divided ConsecutiveOnesAndZeros to two functions for giving next integer from received stream.

Upvotes: 2

v010dya
v010dya

Reputation: 5848

Well, you could try to make it faster by splitting the first part into threads.

For example, if you have a function that you described you would call several of them as std::thread or std::future depending on how you wish to approach it. After they all finish you could compare the two border bits (one at the end of the previous, and one at the start of the next) and either add the first result count to the last result count or push the result onto the result of the previous, all other parts of the result get pushed onto previous without any comparison.

This of course will be overdoing it if your input is quite short.

Upvotes: 1

Related Questions