Reputation: 2984
I have a big list of strings (about 5k-20k entries) that I need to order and also to remove duplicates from.
I've done this in 2 ways now, once with a hashset and once solely with linq. Tests with that number of entries did not show a big difference but I'm wondering what way and thus what method would be better suited.
For the ways (myList is of the datatype List):
Linq: I'm using 1 linq statement to order the list and get the distinct values from it.
myList = myList.OrderBy(q => q).Distinct().ToList();
Hashset: I'm using hashset to remove all duplicates and then I'm ordering the list
myList = new HashSet<String>(myList).ToList<String>();
myList = myList.OrderBy(q => q).ToList();
Like I said tests I made were about the same time consumption for both methods but I'm still wondering if one method is better than the other and if so why (the code is for a high performance part and I need to get every millisecond I can out of it).
Upvotes: 6
Views: 7640
Reputation: 124726
If you're really concerned about every nanosecond, then
myList = myList.Distinct().OrderBy(q => q).ToList();
might be slightly faster than:
myList = myList.OrderBy(q => q).Distinct().ToList();
if there are a large number of duplicates.
The LINQ method is more readable and will have similar performance to explicitly creating a HashSet<T>
as others have said. In fact it may be slightly faster if the original List is already sorted, since the LINQ method will preserve the initial order before sorting, while explicitly creating a HashSet<T>
will enumerate in an undefined order.
Upvotes: 8
Reputation: 101701
They are pretty much the same. Distinct
also uses a Set<T>
to eliminate duplicates. My suggestion is use the Distinct
first then sort your items. Also in your second code, ToList<String>
call is redundant, you can use OrderBy
on HashSet
then call ToList
.
Upvotes: 0