ali_bahoo
ali_bahoo

Reputation: 4863

Interesting stdext::hash_value() implementation

I was trying some hashing algorithms with stdext::hash_value() on VS2010 and realized this:

#include <iostream>
#include <xhash>
using namespace std;

int main() 
{
#ifdef _WIN64
    std::cout << "x64" << std::endl;
#else
    std::cout << "x32" << std::endl;
#endif

    std::cout << stdext::hash_value(345.34533) << std::endl;
    std::cout << stdext::hash_value(345.566) << std::endl;

    return 0;
}

// Output is:
// x64
//3735928758
//3735928758

I tried some other couples of double variable that has the same integer but different fractional part. Like 1.234 vs 1.568. Hash values were always the same. So I took a look at source of hash_value() and saw

#define _HASH_SEED  (size_t)0xdeadbeef

template<class _Kty> inline
size_t hash_value(const _Kty& _Keyval)
{   // hash _Keyval to size_t value one-to-one
    return ((size_t)_Keyval ^ _HASH_SEED);
}

_KeyVal is cast to size_t which does not make sense to me. The function simply ignores the fractional part of double. What is the logic behind this implementation? Is it a bug or feature?

Upvotes: 3

Views: 2699

Answers (5)

Puppy
Puppy

Reputation: 146910

VC10 contains the C++0x Standard hashing mechanisms, so there's no need to use the stdext ones anymore, and std::hash does contain a mechanism for double which performs a bitwise conversion and then hashes. That code that you have for stdext is just a fallback mechanism, and it's not really intended for use with floating-point types. I guess it's a design oversight.

template<>
class hash<double>
    : public unary_function<double, size_t>
{   // hash functor
public:
typedef double _Kty;
typedef _ULonglong _Inttype;    // use first 2*32 bits

size_t operator()(const _Kty& _Keyval) const
    {   // hash _Keyval to size_t value by pseudorandomizing transform
    _Inttype _Bits = *(_Inttype *)&_Keyval;
    return (hash<_Inttype>()(
        (_Bits & (_ULLONG_MAX >> 1)) == 0 ? 0 : _Bits));
    }
};

Upvotes: 0

BCoates
BCoates

Reputation: 1518

stdext::hash_value isn't a hash function. It's the input to a hash function, you specialize it for your type so that it can be used as a key for the stdext hash classes. There doesn't seem to be any documentation for it however. The actual hash function is stdext::hash_compare.

But because there is no default specialization of hash_value it uses the convert-to-int method which ignores the fractional part.

There is an almost-identical bug for the standard std::tr1::hash/std::hash function up until vc10:

http://connect.microsoft.com/VisualStudio/feedback/details/361851/std-tr1-hash-float-or-hash-double-is-poorly-implemented

in vc10 std::hash gets a double specialization that hashes the bits. I guess stdext is considered obsolete now so there's no fix for it even in vc10.

Upvotes: 2

James Kanze
James Kanze

Reputation: 153899

This is, apparently, an attempt to provide a generic hash function for integers (although I don't see what the xor adds). It quite clearly won't work for most other types. Including floating point.

Providing a good hash function for a floating point value is difficult; if I were trying to make a generic hash, I'd probably start by testing for 0, NaN and Inf, and returning predefined hashes for those (or rejecting NaN completely as it is not a valid hashable value), and then simply using a standard string hash on the underlying bytes. This will at least make hashing compatible with the == operator. But the problems of precision mean that == itself may not be what is needed. Nor <, in the case of std::map, since std::map uses < to define an equality relationship, and depending on the source of the floats or doubles, such an equality relationship might not be appropriate for a hash table.

At any rate, don't expect to find a standard hash function for the floating point types.

Upvotes: 0

Bo Persson
Bo Persson

Reputation: 92211

That's old code - doesn't seem very good.

You should probably try std::hash instead.

Upvotes: 0

JaredPar
JaredPar

Reputation: 754525

The function is written to work with any type of data. It makes no assumption about the size and hence is inefficient for certain types. You can override this behavior for doubles to make it more efficient via a template specialization

template<>
size_t hash_value<double>(const double& key) {
    return // Some awesome double hashing algorithm 
}

Putting this definition above your main method will cause the calls to stdext::hash_value(354.566) to bind to this definition as opposed to the standard one

Upvotes: 1

Related Questions