Reputation: 41972
I have a Dictionary
with a custom hashing function. I want to test the hash function, because even though it returns different hash results for my test values, some of them may still map to the same bucket due to the modulo %
operation. So how to check if there are collisions in C# Dictionary
with custom hash function and improve that function?
This is a development test to fine-tune the hash function and won't go into production so no worries about the changes in internal implementation in other versions!!!
In C++ it's possible to get the map's bucket size to check the collision status but I couldn't find a way to do that in C#. How can I know if Dictionary
has been collided?
Upvotes: -1
Views: 800
Reputation: 8815
You're probably better off creating a custom Dictionary
implementation that changes the Add
and Remove
methods to check for hash collisions based on the computer GetHashCode
of the elements. You can compose with a "real" Dictionary
internally to do the real work of storing the elements.
Here's a sample version. You could optimize the Add
and Remove
methods depending on the type of hashes your expecting.
public class CollisionDetectingDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
private readonly Dictionary<TKey, TValue> InternalDictionary = new Dictionary<TKey, TValue>();
private readonly List<int> HashCodesInDictionary = new List<int>();
public event Action<int, TKey, IEnumerable<TKey>> HashCollision;
public TValue this[TKey key] { get => InternalDictionary[key]; set => InternalDictionary[key] = value; }
public ICollection<TKey> Keys => InternalDictionary.Keys;
public ICollection<TValue> Values => InternalDictionary.Values;
public int Count => InternalDictionary.Count;
public bool IsReadOnly => false;
public void Add(TKey key, TValue value)
{
Add(new KeyValuePair<TKey, TValue>(key, value));
}
public void Add(KeyValuePair<TKey, TValue> item)
{
var hashCode = item.Key.GetHashCode();
if (HashCodesInDictionary.Contains(hashCode))
{
var collisions = GetKeysByHashCode(hashCode);
HashCollision?.Invoke(hashCode, item.Key, collisions);
}
Add(item);
}
private IEnumerable<TKey> GetKeysByHashCode(int hashCode)
{
foreach (var key in Keys)
{
if(key.GetHashCode() == hashCode)
{
yield return key;
}
}
}
public void Clear()
{
InternalDictionary.Clear();
}
public bool Contains(KeyValuePair<TKey, TValue> item)
{
return InternalDictionary.Contains(item);
}
public bool ContainsKey(TKey key)
{
return InternalDictionary.ContainsKey(key);
}
public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
{
((IDictionary<TKey,TValue>)InternalDictionary).CopyTo(array, arrayIndex);
}
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
{
return InternalDictionary.GetEnumerator();
}
public bool Remove(TKey key)
{
var hashCode = key.GetHashCode();
if(GetKeysByHashCode(hashCode).Count() == 1)
{
HashCodesInDictionary.Remove(hashCode);
}
return InternalDictionary.Remove(key);
}
public bool Remove(KeyValuePair<TKey, TValue> item)
{
return Remove(item.Key);
}
public bool TryGetValue(TKey key, out TValue value)
{
return InternalDictionary.TryGetValue(key, out value);
}
IEnumerator IEnumerable.GetEnumerator()
{
return InternalDictionary.GetEnumerator();
}
}
Upvotes: 4
Reputation: 2482
You can get internal buckets in the following way:
var dictionary = new Dictionary<string, int>();
dictionary.Add("a", 8);
dictionary.Add("b", 1);
var buckets = dictionary.GetType().GetField("_buckets", BindingFlags.NonPublic | BindingFlags.Instance)
.GetValue(dictionary); // use "buckets" for 4.x
Upvotes: 5