Reputation: 175
I came across this strange behaviour in C# vs Java. The HashSet in Java seems to compare the contents in an object and reject duplicates or gives out a boolean value for when it is checked for contains. The same isn't true in C#. Here are the code examples:
Java code:
HashSet<List<Integer>> setOfLists = new HashSet<List<Integer>>();
List<Integer> list1 = Arrays.asList(1, 2, 3);
List<Integer> list2 = Arrays.asList(1, 2, 3);
// Adding list1 and list2 to the HashSet
setOfLists.add(list1);
setOfLists.add(list2);
// Checking if list1 is in the HashSet
System.out.println(setOfLists.contains(list1)); // Output is true
// Checking if new Array with same content is in the HashSet
System.out.println(setOfLists.contains(Arrays.asList(1, 2, 3))); // Output is true
// Checking the length of the HashSet
System.out.println(setOfLists.size()); // Output is 1
C# Code:
HashSet<List<int>> setOfLists = new HashSet<List<int>>();
List<int> list1 = new List<int> { 1, 2, 3 };
List<int> list2 = new List<int> { 1, 2, 3 };
// Adding list1 and list2 to the HashSet
setOfLists.Add(list1);
setOfLists.Add(list2);
// Checking if list1 is in the HashSet
Console.WriteLine(setOfLists.Contains(list1)); // Output is true
// Checking if new Array with same content is in the HashSet
Console.WriteLine(setOfLists.Contains(new List<int> { 1, 2, 3 })); // Output is false
// Checking the length of the HashSet
Console.WriteLine(setOfLists.Count); // Output is 2
I don't understand why this difference. As we can see in the Print statements, C# is unable to find the duplicates by the contents in case of objects. HashSets are supposed to be used to eliminate duplicates and do so in O(1) time.
The biggest difference I see from the previous answers are all the solutions suggested have time complexity of O(N) where N is the number of elements (lists) in the HashSet. However, in Java, this seems to happen without having to explicitly modifying the comparers, and with O(1) complexity for at least fairly smaller Lists.
Is there any alternate options to achieve that in C# to achieve this in O(1) time complexity?
Upvotes: 0
Views: 297
Reputation: 19555
The problem is that the List<T>
class in C# does not overwrite the Equals()
or GetHashCode()
method, see Check if two lists are equal. This means that the two lists
List<int> list1 = new List<int> { 1, 2, 3 };
List<int> list2 = new List<int> { 1, 2, 3 };
are not equal (according to Equals()
), see the following example code:
List<int> list1 = new List<int> { 1, 2, 3 };
List<int> list2 = new List<int> { 1, 2, 3 };
Console.WriteLine(list1);
Console.WriteLine(list2);
Console.WriteLine(list1.GetHashCode());
Console.WriteLine(list2.GetHashCode());
Console.WriteLine(list1.Equals(list2));
This could generate the following output:
System.Collections.Generic.List`1[System.Int32]
System.Collections.Generic.List`1[System.Int32]
37320431
339559
False
Therefore, the HashSet<List<int>>
instance contain two list references, which are (still) not equal, as seen in your code:
Console.WriteLine(setOfLists.Count); // Output is 2
The solution is to provide a custom IEqualityComparer implementation the HashSet
instance should use. It could look like this:
public class ListIntSequenceEqualComparator : IEqualityComparer<List<int>>
{
public bool Equals(List<int> x, List<int> y)
{
return x.SequenceEqual(y);
}
public int GetHashCode([DisallowNull] List<int> obj)
{
return obj.Sum(); // or some other valid hashcode calculation
}
}
With this implementation you get the expected result as in java:
HashSet<List<int>> setOfLists = new HashSet<List<int>>(new ListIntSequenceEqualComparator());
List<int> list1 = new List<int> { 1, 2, 3 };
List<int> list2 = new List<int> { 1, 2, 3 };
// Adding list1 and list2 to the HashSet
setOfLists.Add(list1);
setOfLists.Add(list2);
// Checking if list1 is in the HashSet
Console.WriteLine(setOfLists.Contains(list1));
// Checking if new Array with same content is in the HashSet
Console.WriteLine(setOfLists.Contains(new List<int> { 1, 2, 3 }));
// Checking the length of the HashSet
Console.WriteLine(setOfLists.Count);
This will generate the following output:
True
True
1
Upvotes: 2