Reputation: 617
This might be a silly question, but here goes :
I hashed a dictionary of words into an unordered_set based hash-table. My hash function was made intentionally "bad", in that all strings that contained the same set of letters would hash to the same value. I initially tried to over-ride the normal hash function behaviour, and use a "frequency histogram" of the letters in each word as a hash value (which I learnt was impossible :) ), but one of the threads suggested using a 26-bit bitmask to achieve the same. The hash function works fine and dandy thus far.
For example, in my scheme, CITIED and CITED hash to the same value, 1049144. My idea was that given a set of letters, I wanted to find all the words containing letters from that set.
I am guessing that I haven't quite understood the concept of hashing (or my code is plain wrong), as I can't quite explain the behaviour I encountered :
I decided to look for all words which consisted of letters from the string "LIVEN".
My output (with hash key) was as follows :
VENVILLE,4215328
LEVIN,4215328
ENLIVEN,4215328
CURTSEYED,37486648
How on earth did CURTSEYED land up there? As can be seen, it has a different hash value from the remaining three words. Where does the fault lie with my understanding/implementation of the hash table?
Code that produced above output:
typedef std::unordered_set< std::string, my_string_hash_function, my_string_equality> Dict
DictHash dict;
DictHash::const_local_iterator c_l_itr;
DictHash::size_type bs = dict.bucket (std::string ("LIVEN"));
for (c_l_itr = dict.begin(bs); c_l_itr != dict.end(bs); c_l_itr++)
std::cout
My hash function :
struct my_string_hash_function
{
std::size_t operator()(const std::string& s) const
{
unsigned long hash = 0;
std::string::const_iterator itr;
for (itr = s.begin(); itr != s.end(); itr++)
hash |= 2 << (*itr - int('A'));
return hash;
}
};
Comparison function :
struct my_string_equality { bool operator()(const std::string& s1, const std::string& s2) const { if (s1.length() != s2.length()) return false; unsigned int hash1 = 0, hash2 = 0; const char *str1, *str2; int i,len; len = s1.length(); str1 = s1.c_str(); str2 = s2.c_str(); for (i = 0; i < len; i++) { hash1 |= 2 << (str1[i] - (int)'A'); hash2 |= 2 << (str2[i] - (int)'A'); } return hash1 == hash2; } };
Upvotes: 1
Views: 509
Reputation: 546
I think you've also got a potential bug in the my_string_equality
... don't you just want to use the regular std::string::operator==()
? AFAIK you should be doing a comparison of the actual object values, not a comparison of their hash (the container already knows the hash value, it could just call my_string_hash_function
and compare the results if that was what it needed to do).
Upvotes: 0
Reputation: 54466
Different hash values will not necessarily end up in different buckets. Generally a hash table will choose a bucket based on hash_value % number_of_buckets
, so hashes that are equal modulo the number of buckets will wind up in the same bucket.
Essentially, you cannot guarantee anything about which hash value appears in which bucket.
Upvotes: 3