Nathan
Nathan

Reputation: 6216

.NET inconsistent hash output

I have a class representing a unique real-world object, its name is personally identifiable information (a vehicle license plate if you're interested) so as a basic first step I am hashing the name and using that instead. (I know - salt required etc. - this is just a foundation)

I have a test which instantiates the object with a fixed input (a test name - "A") and asserts that the Id (Base 64 string of the hash output) is as expected. However, it occasionally fails(!!)

I dug a bit deeper and here's a screenshot of a conditional breakpoint which breaks only when the hash output isn't the norm. The input is still as expected (the 'bytes' variable contains { 65 }, but the output is different from the normal sha384 output hash (normally it's "rRSq8lAgvvL9Tj617AxQJyzf1mB0sO0DfJoRJUMhqsBymYU3S+6qW4ClBNBIvhhk")

Lines 19-25 are split up a bit to facilitate debugging, but otherwise this class is as it should be.

Any clues as to how this is possible would be very welcome. Running Windows 11, using .NET 7 and the latest version of Visual Studio Enterprise.

Here's a pic of a conditional breakpoint being hit where hash output is not the norm: Conditional breakpoint hit where hash output is not the norm

Here's the code should anyone wish to try to reproduce it (note it's not consistent: it's just occasional)

using System.Security.Cryptography;
using System.Text;

namespace Domain.Models.Object
{
    /// <summary>
    /// One-way identifier
    /// </summary>
    public record ObjectIdentifier
    {
        private static SHA384 hash = SHA384.Create();

        public ObjectIdentifier(string name)
        {
            if (name == null)
            {
                throw new ArgumentNullException(nameof(name));
            }
            var bytes = Encoding.UTF8.GetBytes(name);


            Id = Convert.ToBase64String(hash.ComputeHash(bytes));

            int x = 0;
            x++;
        }

        public string Id { get; init; }

        public override string ToString()
        {
            return Id;
        }
    }
}

And here's the test:

[Fact]
public void ObjectIdentifier_AsExpected()
{
    // arrange
    var obj = new ObjectIdentifier("A");

    // assert
    Assert.Equal("rRSq8lAgvvL9Tj617AxQJyzf1mB0sO0DfJoRJUMhqsBymYU3S+6qW4ClBNBIvhhk", obj.Id);
}

I also note that the new hash value is not consistently the same: here's another failure with a different output from the previous screenshot:

Another test failure

I also note that adding this to line 20 causes the inconsistency to stop happening altogether... Unfortunately, this is not a suitable fix :P

Debug.Assert(bytes.Length == 1 && bytes[0] == 65)

Update The invalid outputs appear to just be the two supplied above, I haven't observed any further variants.

Also, changing it to be a class (instead of a record) makes no difference.

I'm also observing this effect in a test console app which has only two identifiers supplied to it, but in fact more than two hashes are output:

inconsistent hash output

Upvotes: 0

Views: 236

Answers (1)

Nathan
Nathan

Reputation: 6216

So, as Matt kindly pointed out in the comments, this operation isn't thread-safe. The shared worker object (shared because it was static) was to blame. An easy one to miss - and a great example of why unit tests are a must!

I opted to use an object pool to get a unique worker per instantiation while allowing for some reduction in allocation overhead:

using Domain.Implementations;
using Microsoft.Extensions.ObjectPool;
using System.Text;

namespace Domain.Models.Object
{
    /// <summary>
    /// One-way identifier
    /// </summary>
    public class ObjectIdentifier
    {
        private static ObjectPool<HashWrapper> hashObjects = ObjectPool.Create<HashWrapper>();

        public ObjectIdentifier(string name)
        {
            if (name == null)
            {
                throw new ArgumentNullException(nameof(name));
            }

            var hashworker = hashObjects.Get();
            try
            {
                Id = Convert.ToBase64String(hashworker.Value.ComputeHash(Encoding.UTF8.GetBytes(name)));
            }
            finally
            {
                hashObjects.Return(hashworker);
            }            
        }

        public string Id { get; init; }

        public override string ToString()
        {
            return Id;
        }
    }
}

The hash wrapper class (wraps the worker and provides a public constructor for the object pool)

internal class HashWrapper
{
    public HashWrapper()
    {
        Value = SHA384.Create();
    }

    public HashAlgorithm Value { get; private set; }
}

Update

This implementation with a shared pool provided a (very minor) increase in speed and memory allocation when constructing higher numbers of objects. For smaller numbers, there is a (very minor) performance penalty. YMMV - see below.

This benchmark ran a version with Shared = true (normal lock on a shared object), false (new sha worker instance for each instantiation) and null [? in the column] : to signal using an object pool

Benchmark results

Upvotes: 2

Related Questions