Reputation: 794
What's the best way to implement a function that given an object is able to return a hash key?
The requirements would be:
HashCodeFn(((bool?)false, "example")) != HashCodeFn(((bool?)null, "example"))
[Serializable]
attribute)I've tried with .GetHashCode
but:
null
vs 0
vs false
I've tried with:
private static int GetHashKey<T>(T input)
{
using var memoryStream = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(memoryStream, input);
memoryStream.Position = 0;
using var reader = new StreamReader(memoryStream);
return reader.ReadToEnd().GetHashCode();
}
but:
[Serializable]
(some types I have no control over and don't implement those)I'm thinking of serialising the object to JSON in the most compact form and then get the GetHashCode
of that string, but I'm not sure how well it works with something like NodaTime.Instant
. Is that the fastest way to accomplish this?
This is used as a data loader key (see github.com/graphql/dataloader for an example) if that is of any help to understand the use case.
Specifically a data loader key is used to handle batching. When you have many requests with input (a, b, c)
and you want to "pivot" on, for example, a
(which means that (1, b, c), (2, b, c), (3, b, c)
should call a batch function fn([1, 2, 3], (b, c))
then you need to be able to define a key that is the same for the same values of (b, c)
to use as the data loader key.
From the input perspective, specifying or not a bool on something like b
, for example, is considered to be 2 different things and should be batched on two different functions.
If I were to use (b, c).GetHashCode()
then I it would consider ((bool?)false, "ok")
and ((bool?)null, "ok")
to be the same thing, therefore batching them to the same batch function yielding unexpected results.
Upvotes: 1
Views: 99
Reputation: 119
I don't think there's any particularly efficient way to do what you want. Some sort of additional processing will be required to make sure you're getting appropriate hash codes. Also, keep in mind that if the classes you don't control already implement Equals and GetHashCode and Equals returns true for instance that differ only by something like a nullable boolean being false or null, then it's incorrect for GetHashCode to return different values.
You could serialise to JSON to achieve what you want. That will exclude any fields that might happen to be annotated for exclusion. Assuming none of the fields relevant to a hash code are excluded then that'll work. Alternatively you could write extension functions for the types that are going to cause clashes and customise the hashing for those fields. Then use reflection (which is likely to be used in serialising to JSON as well) to iterate over class members and get hash codes using your extensions where necessary. Something along the lines of the code below.
class ThingToHash
{
public bool? CouldBeFalseOrNullOrNull { get; }
public int IncludesZero { get; }
public string CanBeEmptyOrNull { get; }
private string Hidden { get; }
public ThingToHash(bool? couldBeFalseOrNull, int includesZero, string canBeEmptyOrNull)
{
CouldBeFalseOrNullOrNull = couldBeFalseOrNull;
IncludesZero = includesZero;
CanBeEmptyOrNull = canBeEmptyOrNull;
}
}
static class StringExtensions
{
public static int GetAltHashCode(this string toHash)
{
return toHash?.GetHashCode() ?? 17;
}
}
static class NullableBoolExtensions
{
public static int GetAltHashCode(this bool? toHash)
{
return toHash?.GetAltHashCode() ?? true.GetHashCode() * 19;
}
}
static class BoolExtensions
{
public static int GetAltHashCode(this bool toHash)
{
if (false == toHash)
{
return true.GetHashCode() * 17;
}
return toHash.GetHashCode();
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine(false.GetHashCode());
Console.WriteLine(((bool?)null).GetHashCode());
Console.WriteLine(false == (bool?)null);
Console.WriteLine(HashUnknownObject(new ThingToHash(null, 0, "")));
Console.WriteLine(HashUnknownObject(new ThingToHash(false, 0, "")));
Console.ReadKey();
}
static int HashUnknownObject(Object toHash)
{
PropertyInfo[] members = toHash.GetType().GetProperties(BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public);
int hash = 17;
foreach (PropertyInfo memberToHash in members)
{
object memberVal = memberToHash.GetValue(toHash);
if (null == memberVal)
{
if (typeof(bool?) == memberToHash.PropertyType)
{
hash += 31 * ((bool?)null).GetAltHashCode();
}
else if (typeof(string) == memberToHash.PropertyType)
{
hash += 31 * ((string)null).GetAltHashCode();
}
}
else
{
hash += 31 * memberToHash.GetValue(toHash).GetHashCode();
}
}
return hash;
}
}
You'd obviously have to add other checks to use the bool extension, add other extensions and so on to cover the cases you need. And do testing to check the impact of using reflection to serialise. You could reduce that for classes that already implement GetHashCode, for instance, by not generating hash codes per member for those.
And this code can obviously be cleaned up. It's just quick and dirty here.
Upvotes: 1