Rabennarbe
Rabennarbe

Reputation: 42

HashSet does not work correctly, reason or alternatives?

I have a HashSet with errors, <Error>ErrorList. "Error" has the properties "file" and "block". So I fill my HashSet with a number of errors, some of which are exactly the same and therefore repeat themselves. The multiple occurrences are completely tolerated by the HashSet. As a last attempt I created a separate list and distincted it: List<Error> noDupes = ErrorList.Distinct().ToList(); But also here my list remains unchanged. Why does neither the hashset nor my noDupes list work? Are there alternative solutions?

Here's the important part of my code:

        #region Properties
        HashSet<Error> ErrorList { get; set; } = new HashSet<Error>();
        private Stopwatch StopWatch { get; set; } = new Stopwatch();
        private string CSVFile { get; set; } = null;
        int n;
        #endregion
                    ErrorList.Add(new Error
                    {
                        File = x,
                        Block = block
                    }); ;
                    
                    

                    n = FileCall.IndexOf(i);
                    int p = n * 100 / FileCall.Count;
                    SetConsoleProgress(n.ToString("N0"), p);
                }
            } 

            int nx = 0;
            List<Error> noDupes = ErrorList.Distinct().ToList();

The Error-Class:

namespace ApplicationNamespace
{
    public class Error
    {
        public string File { set; get; }
        public int Block { set; get; }
    }
}

Upvotes: 0

Views: 1193

Answers (1)

galdin
galdin

Reputation: 14034

Override the default Equals() and GetHashCode() implementations (like the others have mentioned in the comments) for the HashSet<> or Distinct() to work. You can also implement IEquatable<>, which will require you to override the Equals() and GetHashCode() methods.

public class Error : IEquatable<Error>
{

    public string File { set; get; }
    public int Block { set; get; }


    public bool Equals(Error other)
    {

        // Check whether the compared object is null.
        if (Object.ReferenceEquals(other, null)) return false;

        // Check whether the compared object references the same data.
        if (Object.ReferenceEquals(this, other)) return true;

        // Check whether the error's properties are equal.
        return File == other.File && Block == other.Block;
    }

    // If Equals() returns true for a pair of objects
    // then GetHashCode() must return the same value for these objects.
    public override int GetHashCode()
    {
        return $"{Block}-{File}".GetHashCode(); // adjust this as you see fit
    }
}

Reference: https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct?view=netcore-3.1

Remember to handle null values on the File string. (Could replace it with String.Empty for instance.) It's also common to "cache" the hashcode in a private variable, so that once calculated the cached value can be returned on consequent calls to GetHashCode(). For this you will most likely also need to make the class immutable.

(You won't have to do any of this with C# 9's record types.)

Upvotes: 1

Related Questions