mug896
mug896

Reputation: 2025

Why are the results of hash() and hasher.write() not the same?

A number like 1234 has the same results if I use either hash() or hasher.write() functions, but a byte slice like b"Cool" does not. I think it should be the same; why is it not?

use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
use std::mem;

fn main() {
    let mut hasher = DefaultHasher::new();
    1234.hash(&mut hasher);
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = DefaultHasher::new();
    hasher.write(unsafe { &mem::transmute::<i32, [u8; 4]>(1234) });
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = DefaultHasher::new();
    b"Cool".hash(&mut hasher);
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = DefaultHasher::new();
    hasher.write(b"Cool");
    println!("Hash is {:x}", hasher.finish());
}
Hash is 702c1e2053bd76
Hash is 702c1e2053bd76
Hash is 9bf15988582e5a3f
Hash is 7fe67a564a06876a

Upvotes: 4

Views: 2008

Answers (1)

Stargateur
Stargateur

Reputation: 26757

As the documentation says:

The default Hasher used by RandomState. The internal algorithm is not specified, and so it and its hashes should not be relied upon over releases.

If we follow RandomState...

A particular instance RandomState will create the same instances of Hasher, but the hashers created by two different RandomState instances are unlikely to produce the same result for the same values.

Rationale:

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program. Because of this, the randomness of the seed depends on the output quality of the system's random number generator when the seed is created. In particular, seeds generated when the system's entropy pool is abnormally low such as during system boot may be of a lower quality.


I dug into it a little bit and there is no requirement that hash() and write() share the same behavior.

The only requirement is that k1 == k2 -> hash(k1) == hash(k2) for the Hash trait. The Hasher trait has the same property, but there is no requirement that k1 -> hash(k1) == hasher(k1).

That makes sense as the Hash trait is intended to be implemented by the user, and they can implement it as they like. For example, one could want to add salt into the hash.

Here is a minimal complete and not verifiable example, that could produce either the same output or different output, depending on the implementation:

use std::collections::hash_map::{DefaultHasher, RandomState};
use std::hash::{BuildHasher, Hasher, Hash};

fn main() {
    let s = RandomState::new();

    let mut hasher = s.build_hasher();
    b"Cool".hash(&mut hasher);
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = s.build_hasher();
    hasher.write(b"Cool");
    println!("Hash is {:x}", hasher.finish());

    let s = DefaultHasher::new();

    let mut hasher = s.clone();
    b"Cool".hash(&mut hasher);
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = s.clone();
    hasher.write(b"Cool");
    println!("Hash is {:x}", hasher.finish());
}

You can see that the implementation of Hash for a slice also writes the length of the slice:

#[stable(feature = "rust1", since = "1.0.0")]
impl<T: Hash> Hash for [T] {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.len().hash(state);
        Hash::hash_slice(self, state)
    }
}

Also, it looks like hash_slice() has the behavior you want, but it's not stated that it would always be the case (but I think this is the intended behavior and that will not change, I asked here).

use std::collections::hash_map::DefaultHasher;
use std::hash::Hasher;

fn main() {
    let s = DefaultHasher::new();

    let mut hasher = s.clone();
    std::hash::Hash::hash_slice(b"Cool", &mut hasher);
    println!("Hash is {:x}", hasher.finish());

    let mut hasher = s.clone();
    hasher.write(b"Cool");
    println!("Hash is {:x}", hasher.finish());
}

Upvotes: 4

Related Questions