rxu
rxu

Reputation: 1407

convert bitarray to set

How to convert bitarray to set quickly with c++? Each of the actual bitarrays has 750,000 bits.

Example 1:

bitarray: 01011111
set: {0,1,2,3,4,5,7}
or set: {1,3,4,5,6,7}

Example 2:

bitarray: 0101 1111 0001 0001
set: {0,4,8,9,10,11,12,14}
or set: {1,3,4,5,6,7,11,15}

The set is an array of usigned 32 bit integers (uint32_t). Both kinds of set are acceptable.

The bitarray is contiguous in memory. The first bit of the bitarray has correct alignment for simd. For now I am using a custom memory allocator with std::vector to hold the bitarray. 1 bit in memory per 1 bit in the bitarray.

Thanks.

Update:

this so question does the reverse

loop through bits in c

How to define and work with an array of bits in C?

gmpy uses the scan1 function of the gmp library. scan1 seems find first set, as in wikipedia here

Upvotes: 0

Views: 190

Answers (2)

rxu
rxu

Reputation: 1407

code:

#include <iostream>
#include <vector>
#include <time.h>

using namespace std;

template <typename T>
uint32_t bitarray2set(T& v, uint32_t * ptr_set){
    uint32_t i;
    uint32_t base = 0;
    uint32_t * ptr_set_new = ptr_set;
    uint32_t size = v.capacity();
    for(i = 0; i < size; i++){
        find_set_bit(v[i], ptr_set_new, base);
        base += 8*sizeof(uint32_t);
    }
    return (ptr_set_new - ptr_set);
}

inline void find_set_bit(uint32_t n, uint32_t*& ptr_set, uint32_t base){
    // Find the set bits in a uint32_t
    int k = base;
    while(n){
        if (n & 1){
            *(ptr_set) = k;
            ptr_set++;
        }
        n = n >> 1;
        k++;
    }
}

template <typename T>
void rand_vector(T& v){
    srand(time(NULL));
    int i;
    int size = v.capacity();
    for (i=0;i<size;i++){
        v[i] = rand();
    }
}

template <typename T>
void print_vector(T& v, int size_in = 0){
    int i;

    int size;
    if (size_in == 0){
        size = v.capacity();
    } else {
        size = size_in;
    }
    for (i=0;i<size;i++){
        cout << v[i] << ' ';
    }
    cout << endl;
}

int main(void){
    const int test_size = 6000;
    vector<uint32_t> vec(test_size);
    vector<uint32_t> set(test_size*sizeof(uint32_t)*8);
    rand_vector(vec);
    //for (int i; i < 64; i++) vec[i] = -1;
    //cout << "input" << endl;
    print_vector(vec);
    //cout << "calculate result" << endl;

    int i;
    int rep = 10000;
    uint32_t res_size;

    struct timespec tp_start, tp_end;
    clock_gettime(CLOCK_MONOTONIC, &tp_start);
    for (i=0;i<rep;i++){
        res_size = bitarray2set(vec, set.data());
    }
    clock_gettime(CLOCK_MONOTONIC, &tp_end);
    double timing;
    const double nano = 0.000000001;

    timing = ((double)(tp_end.tv_sec  - tp_start.tv_sec )
           + (tp_end.tv_nsec - tp_start.tv_nsec) * nano) /(rep);

    cout << "timing per cycle: " << timing << endl;
    cout << "print result" << endl;
    //print_vector(set, res_size);
}

result (compiled with icc -O3 code.cpp -lrt)

...
timing per cycle: 0.000739613
print result

0.0008 seconds to convert 768000 bits to set. But there are at least 10,000 arrays of 768,000 bits in each cycle. That is 8 seconds per cycle. That is slow.

Upvotes: 0

jxh
jxh

Reputation: 70492

If I understand your question:

for (int i = 0; i < 750000; ++i) {
    if (bitarray_has(bitarray, i)) {
        set_of_numbers.push_back(i);
    }
}

I don't believe walking bitarray will be particularly slow, but the push_back() can be made faster if you know how many elements will be created. Then you can use reserve() to pre-allocate the memory.

Upvotes: 1

Related Questions