AwaitedOne
AwaitedOne

Reputation: 1012

Generate hash for boost::dynamic_bitset and convert hash back to boost::dynamic_bitset

I am looking to generate a boost::dynamic_bitset hash so that I can store the value in boost::bimaps. I tried the following code,Test code here.

#include <iostream>
#include <boost/dynamic_bitset.hpp>
#include <unordered_map>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
#include <boost/bimap/set_of.hpp>
#include <boost/bimap/multiset_of.hpp>

#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
        return boost::hash_value(bs.m_bits);
    }
    }

namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<unsigned long long int>,
        bimaps::unordered_multiset_of<size_t> > bimap_reference;
typedef bimap_reference::value_type position;
bimap_reference reference_index_vector;

int main()
{

    std::string str = "1011010001101101000001101101000011111111011010000011011010000111111111110110100011011010000011011010000111111110110100000110110100001111111111";
    boost::dynamic_bitset<>  bits = boost::dynamic_bitset<> (str);
    std::cout << "bitmap " << bits << std::endl;
    std::cout << "Number of bits " << bits.count() << std::endl;

    size_t hash1 = boost::hash_value (bits);
    std::cout << "Hash value "  << hash1 << std::endl;


/* Insert hash value in bimap
 *
 */

//    reference_index_vector.insert(position(10000000000, hash1));
//    for( bimap_reference::const_iterator iter = reference_index_vector.begin(), iend = reference_index_vector.end();
//            iter != iend; ++iter ) {
//        std::cout << iter->left << " <--> "<< iter->right <<std::endl;
//    }

    return 0;
}

I get the error

In file included from /usr/include/boost/dynamic_bitset.hpp:15:0, from 3: In instantiation of 'std::size_t boost::hash_value(const boost::dynamic_bitset&) [with B = long unsigned int; A = std::allocator; std::size_t = long unsigned int]': 34:40: required from here /usr/include/boost/dynamic_bitset/dynamic_bitset.hpp:422:17: error: 'boost::dynamic_bitset<>::buffer_type boost::dynamic_bitset<>::m_bits' is private buffer_type m_bits; ^ 16:37: error: within this context

Not sure what is going wrong.

  1. How to hash boost::dynamic_bitset
  2. How to convert the hash back to orignial bitset.
  3. Total space needed (count of both 0 and 1 or only 1). Above code shows 80 bits only by bits.count(). I tried the following to generate the hash value, but not sure how much space is needed.

Also, I tried to generate the hash value of the bitset by the following code

/*Generating hash by bitset
     * 
     */
    std::bitset<142> seq (str);
    std::hash<std::bitset<142>> hash_bitset;
    std::cout << "Bitset " << seq << std::endl;
    std::cout << "Hash value " << hash_bitset(seq) << std::endl; 

    #Bitset 1011010001101101000001101101000011111111011010000011011010000111111111110110100011011010000011011010000111111110110100000110110100001111111111
    #Hash value 4886653603414440856

Upvotes: 2

Views: 811

Answers (1)

sehe
sehe

Reputation: 392833

Ok, I'm detecting much confusion about the essence of "hashing", so a few friendly pointers to get started:

Q. 2. How to convert the hash back to orignial bitset.

That's impossible. A hash is lossy digest. You can only do this if the hash is a Perfect Hash which, due to the laws of entropy, cannot happen if the bitset capacity exceeds the size of size_t on your platform (typically 32 or 64 bits).

Q. I also tried creating a hash by ...

 std::bitset<142> seq (str);
 ....

I hope you realize that std::bitset<> is an entirely different type, so it's not really related to the task. And, since it's not dynamic, it's rather unhelpful for the task, even as a workaround.

But Most Importantly:

Hashes are used by hash-tables (like unordered_*<>) but they are not stored. Hashes are lossy digests, only used to get a good distribution over the internal buckets¹. For actual element equality, std::equal<T> is still used.

In other words:

typedef boost::bimap<bimaps::unordered_set_of<unsigned long long int>,
        bimaps::unordered_multiset_of<size_t> > bimap_reference;

is unsuited for creating a map of anything other than size_t or unsigned long long². If you store hashes of things there:

reference_index_vector.insert(position(10000000000, hash1));

, you lose the original information. There's no way to get the bitset from hash1.

The compiler error

Your hash_value implementation mistakenly uses private members of dynamic_bitset<>. You can't because it's not accessible.

Here's a simple implementation of std::hash<> using the public interface:

Live On Coliru

#include <boost/dynamic_bitset.hpp>
#include <boost/functional/hash.hpp>
#include <unordered_map>
#include <sstream>

namespace std {

    template <typename Block, typename Alloc> struct hash<boost::dynamic_bitset<Block, Alloc> > {
        size_t operator()(boost::dynamic_bitset<Block, Alloc> const& bs) const {
            size_t seed = boost::hash_value(bs.size());

            std::vector<Block> blocks(bs.num_blocks());
            boost::hash_range(seed, blocks.begin(), blocks.end());

            return seed;
        }
    };

}

int main() {
    boost::dynamic_bitset<> x, y;
    x.resize(rand()%100, 1);
    y.resize(rand()%100, 0);

    std::unordered_map<boost::dynamic_bitset<>, std::string> m;
    m[x] = "x";
    m[y] = "y";
}

You can use this std::hash<> specialization instead and use boost::bimap with it.

NOTE, that using the public interface is not optimal, because it copies the Blocks (you also did that with the std::bitset<> hack). You might be interested in the Boost Serialization implementation I did for boost::dynamic_bitset<> before:


¹ Assuming, for simplicity, buckets instead of "open addressing" style. The same logic applies there, but somewhat more complicated

² (by the way, please just say uintmax_t or uint64_t)

Upvotes: 3

Related Questions