phuclv
phuclv

Reputation: 41972

How to evaluate custom hash function?

I have a Dictionary with a custom hashing function. I want to test the hash function, because even though it returns different hash results for my test values, some of them may still map to the same bucket due to the modulo % operation. So how to check if there are collisions in C# Dictionary with custom hash function and improve that function?

This is a development test to fine-tune the hash function and won't go into production so no worries about the changes in internal implementation in other versions!!!

In C++ it's possible to get the map's bucket size to check the collision status but I couldn't find a way to do that in C#. How can I know if Dictionary has been collided?

Upvotes: -1

Views: 800

Answers (2)

just.another.programmer
just.another.programmer

Reputation: 8815

You're probably better off creating a custom Dictionary implementation that changes the Add and Remove methods to check for hash collisions based on the computer GetHashCode of the elements. You can compose with a "real" Dictionary internally to do the real work of storing the elements.

Here's a sample version. You could optimize the Add and Remove methods depending on the type of hashes your expecting.

public class CollisionDetectingDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
    private readonly Dictionary<TKey, TValue> InternalDictionary = new Dictionary<TKey, TValue>();
    private readonly List<int> HashCodesInDictionary = new List<int>();

    public event Action<int, TKey, IEnumerable<TKey>> HashCollision; 

    public TValue this[TKey key] { get => InternalDictionary[key]; set => InternalDictionary[key] = value; }
    public ICollection<TKey> Keys => InternalDictionary.Keys;
    public ICollection<TValue> Values => InternalDictionary.Values;
    public int Count => InternalDictionary.Count;
    public bool IsReadOnly => false;

    public void Add(TKey key, TValue value)
    {
        Add(new KeyValuePair<TKey, TValue>(key, value));
    }

    public void Add(KeyValuePair<TKey, TValue> item)
    {
        var hashCode = item.Key.GetHashCode();
        if (HashCodesInDictionary.Contains(hashCode))
        {
            var collisions = GetKeysByHashCode(hashCode);
            HashCollision?.Invoke(hashCode, item.Key, collisions);
        }

        Add(item);
    }

    private IEnumerable<TKey> GetKeysByHashCode(int hashCode)
    {
        foreach (var key in Keys)
        {
            if(key.GetHashCode() == hashCode)
            {
                yield return key;
            }
        }
    }

    public void Clear()
    {
        InternalDictionary.Clear();
    }

    public bool Contains(KeyValuePair<TKey, TValue> item)
    {
        return InternalDictionary.Contains(item);
    }

    public bool ContainsKey(TKey key)
    {
        return InternalDictionary.ContainsKey(key);
    }

    public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
    {
        ((IDictionary<TKey,TValue>)InternalDictionary).CopyTo(array, arrayIndex);
    }

    public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
    {
        return InternalDictionary.GetEnumerator();
    }

    public bool Remove(TKey key)
    {
        var hashCode = key.GetHashCode();
        if(GetKeysByHashCode(hashCode).Count() == 1)
        {
            HashCodesInDictionary.Remove(hashCode);
        }

        return InternalDictionary.Remove(key);
    }

    public bool Remove(KeyValuePair<TKey, TValue> item)
    {
        return Remove(item.Key);
    }

    public bool TryGetValue(TKey key, out TValue value)
    {
        return InternalDictionary.TryGetValue(key, out value);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return InternalDictionary.GetEnumerator();
    }
}

Upvotes: 4

Cihan Yakar
Cihan Yakar

Reputation: 2482

You can get internal buckets in the following way:

var dictionary = new Dictionary<string, int>();
dictionary.Add("a", 8);
dictionary.Add("b", 1);
var buckets = dictionary.GetType().GetField("_buckets", BindingFlags.NonPublic | BindingFlags.Instance)
              .GetValue(dictionary); // use "buckets" for 4.x

Upvotes: 5

Related Questions