Reputation: 2025
A number like 1234
has the same results if I use either hash()
or hasher.write()
functions, but a byte slice like b"Cool"
does not. I think it should be the same; why is it not?
use std::collections::hash_map::DefaultHasher;
use std::hash::{Hash, Hasher};
use std::mem;
fn main() {
let mut hasher = DefaultHasher::new();
1234.hash(&mut hasher);
println!("Hash is {:x}", hasher.finish());
let mut hasher = DefaultHasher::new();
hasher.write(unsafe { &mem::transmute::<i32, [u8; 4]>(1234) });
println!("Hash is {:x}", hasher.finish());
let mut hasher = DefaultHasher::new();
b"Cool".hash(&mut hasher);
println!("Hash is {:x}", hasher.finish());
let mut hasher = DefaultHasher::new();
hasher.write(b"Cool");
println!("Hash is {:x}", hasher.finish());
}
Hash is 702c1e2053bd76
Hash is 702c1e2053bd76
Hash is 9bf15988582e5a3f
Hash is 7fe67a564a06876a
Upvotes: 4
Views: 2008
Reputation: 26757
As the documentation says:
The default
Hasher
used byRandomState
. The internal algorithm is not specified, and so it and its hashes should not be relied upon over releases.
If we follow RandomState
...
A particular instance
RandomState
will create the same instances ofHasher
, but the hashers created by two differentRandomState
instances are unlikely to produce the same result for the same values.
By default,
HashMap
uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program. Because of this, the randomness of the seed depends on the output quality of the system's random number generator when the seed is created. In particular, seeds generated when the system's entropy pool is abnormally low such as during system boot may be of a lower quality.
I dug into it a little bit and there is no requirement that hash()
and write()
share the same behavior.
The only requirement is that k1 == k2 -> hash(k1) == hash(k2)
for the Hash
trait. The Hasher
trait has the same property, but there is no requirement that k1 -> hash(k1) == hasher(k1)
.
That makes sense as the Hash
trait is intended to be implemented by the user, and they can implement it as they like. For example, one could want to add salt into the hash.
Here is a minimal complete and not verifiable example, that could produce either the same output or different output, depending on the implementation:
use std::collections::hash_map::{DefaultHasher, RandomState};
use std::hash::{BuildHasher, Hasher, Hash};
fn main() {
let s = RandomState::new();
let mut hasher = s.build_hasher();
b"Cool".hash(&mut hasher);
println!("Hash is {:x}", hasher.finish());
let mut hasher = s.build_hasher();
hasher.write(b"Cool");
println!("Hash is {:x}", hasher.finish());
let s = DefaultHasher::new();
let mut hasher = s.clone();
b"Cool".hash(&mut hasher);
println!("Hash is {:x}", hasher.finish());
let mut hasher = s.clone();
hasher.write(b"Cool");
println!("Hash is {:x}", hasher.finish());
}
You can see that the implementation of Hash
for a slice also writes the length of the slice:
#[stable(feature = "rust1", since = "1.0.0")]
impl<T: Hash> Hash for [T] {
fn hash<H: Hasher>(&self, state: &mut H) {
self.len().hash(state);
Hash::hash_slice(self, state)
}
}
Also, it looks like hash_slice()
has the behavior you want, but it's not stated that it would always be the case (but I think this is the intended behavior and that will not change, I asked here).
use std::collections::hash_map::DefaultHasher;
use std::hash::Hasher;
fn main() {
let s = DefaultHasher::new();
let mut hasher = s.clone();
std::hash::Hash::hash_slice(b"Cool", &mut hasher);
println!("Hash is {:x}", hasher.finish());
let mut hasher = s.clone();
hasher.write(b"Cool");
println!("Hash is {:x}", hasher.finish());
}
Upvotes: 4