Reputation: 2984

Getting distinct and ordered members from a list of strings - linq or hashset for unique which one is faster / better suited

I have a big list of strings (about 5k-20k entries) that I need to order and also to remove duplicates from.

I've done this in 2 ways now, once with a hashset and once solely with linq. Tests with that number of entries did not show a big difference but I'm wondering what way and thus what method would be better suited.

For the ways (myList is of the datatype List):

Linq: I'm using 1 linq statement to order the list and get the distinct values from it.

myList = myList.OrderBy(q => q).Distinct().ToList();

Hashset: I'm using hashset to remove all duplicates and then I'm ordering the list

myList = new HashSet<String>(myList).ToList<String>();
myList = myList.OrderBy(q => q).ToList();

Like I said tests I made were about the same time consumption for both methods but I'm still wondering if one method is better than the other and if so why (the code is for a high performance part and I need to get every millisecond I can out of it).

Upvotes: 6

Answers (2)

to StackOverflow

Reputation: 124726

If you're really concerned about every nanosecond, then

myList = myList.Distinct().OrderBy(q => q).ToList();

might be slightly faster than:

myList = myList.OrderBy(q => q).Distinct().ToList();

if there are a large number of duplicates.

The LINQ method is more readable and will have similar performance to explicitly creating a HashSet<T> as others have said. In fact it may be slightly faster if the original List is already sorted, since the LINQ method will preserve the initial order before sorting, while explicitly creating a HashSet<T> will enumerate in an undefined order.

Upvotes: 8

Selman Genç

Reputation: 101701

They are pretty much the same. Distinct also uses a Set<T> to eliminate duplicates. My suggestion is use the Distinct first then sort your items. Also in your second code, ToList<String> call is redundant, you can use OrderBy on HashSet then call ToList.

Upvotes: 0

Getting distinct and ordered members from a list of strings - linq or hashset for unique which one is faster / better suited

Answers (2)

Related Questions