Brig Lamoreaux
Brig Lamoreaux

Reputation:

What .NET collection provides the fastest search

I have 60k items that need to be checked against a 20k lookup list. Is there a collection object (like List, HashTable) that provides an exceptionly fast Contains() method? Or will I have to write my own? In otherwords, is the default Contains() method just scan each item or does it use a better search algorithm.

foreach (Record item in LargeCollection)
{
    if (LookupCollection.Contains(item.Key))
    {
       // Do something
    }
}

Note. The lookup list is already sorted.

Upvotes: 174

Views: 168241

Answers (10)

juFo
juFo

Reputation: 18567

In case of .NET 8 you might also consider using System.Buffers.SearchValues<T>

https://learn.microsoft.com/en-us/dotnet/api/system.buffers.searchvalues-1?view=net-8.0

and also for .NET 8 you might have a look at System.Collections.Frozen.FrozenSet<T> or FrozenDictionary<TKey,TValue>:

https://learn.microsoft.com/en-us/dotnet/api/system.collections.frozen.frozenset-1?view=net-8.0

Upvotes: 1

Tod
Tod

Reputation: 2524

I've put a test together:

  • First - 3 chars with all of the possible combinations of A-Z0-9
  • Fill each of the collections mentioned here with those strings
  • Finally - search and time each collection for a random string (same string for each collection).

This test simulates a lookup when there is guaranteed to be a result.

FullCollection

Then I changed the initial collection from all possible combinations to only 10,000 random 3 character combinations, this should induce a 1 in 4.6 hit rate of a random 3 char lookup, thus this is a test where there isn't guaranteed to be a result, and ran the test again:

PartialCollection

IMHO HashTable, although fastest, isn't always the most convenient; working with objects. But a HashSet is so close behind it's probably the one to recommend.

Just for fun (you know FUN) I ran with 1.68M rows (4 characters): BiggerCollection

Upvotes: 14

Jimmy
Jimmy

Reputation: 91432

In the most general case, consider System.Collections.Generic.HashSet as your default "Contains" workhorse data structure, because it takes constant time to evaluate Contains.

The actual answer to "What is the fastest searchable collection" depends on your specific data size, ordered-ness, cost-of-hashing, and search frequency.

Upvotes: 175

user3810900
user3810900

Reputation:

You should read this blog that speed tested several different types of collections and methods for each using both single and multi-threaded techniques.

According to the results, a BinarySearch on a List and SortedList were the top performers constantly running neck-in-neck when looking up something as a "value".

When using a collection that allows for "keys", the Dictionary, ConcurrentDictionary, Hashset, and HashTables performed the best overall.

Upvotes: 13

Mark
Mark

Reputation: 882

Have you considered List.BinarySearch(item)?

You said that your large collection is already sorted so this seems like the perfect opportunity? A hash would definitely be the fastest, but this brings about its own problems and requires a lot more overhead for storage.

Upvotes: 25

SLaks
SLaks

Reputation: 887195

If you don't need ordering, try HashSet<Record> (new to .Net 3.5)

If you do, use a List<Record> and call BinarySearch.

Upvotes: 78

clemahieu
clemahieu

Reputation: 1429

Keep both lists x and y in sorted order.

If x = y, do your action, if x < y, advance x, if y < x, advance y until either list is empty.

The run time of this intersection is proportional to min (size (x), size (y))

Don't run a .Contains () loop, this is proportional to x * y which is much worse.

Upvotes: 4

Brian
Brian

Reputation: 25823

If you're using .Net 3.5, you can make cleaner code using:

foreach (Record item in LookupCollection.Intersect(LargeCollection))
{
  //dostuff
}

I don't have .Net 3.5 here and so this is untested. It relies on an extension method. Not that LookupCollection.Intersect(LargeCollection) is probably not the same as LargeCollection.Intersect(LookupCollection) ... the latter is probably much slower.

This assumes LookupCollection is a HashSet

Upvotes: 3

Rich Schuler
Rich Schuler

Reputation: 41972

If it's possible to sort your items then there is a much faster way to do this then doing key lookups into a hashtable or b-tree. Though if you're items aren't sortable you can't really put them into a b-tree anyway.

Anyway, if sortable sort both lists then it's just a matter of walking the lookup list in order.

Walk lookup list
   While items in check list <= lookup list item
     if check list item = lookup list item do something
   Move to next lookup list item

Upvotes: 3

Robert Horvick
Robert Horvick

Reputation: 4036

If you aren't worried about squeaking every single last bit of performance the suggestion to use a HashSet or binary search is solid. Your datasets just aren't large enough that this is going to be a problem 99% of the time.

But if this just one of thousands of times you are going to do this and performance is critical (and proven to be unacceptable using HashSet/binary search), you could certainly write your own algorithm that walked the sorted lists doing comparisons as you went. Each list would be walked at most once and in the pathological cases wouldn't be bad (once you went this route you'd probably find that the comparison, assuming it's a string or other non-integral value, would be the real expense and that optimizing that would be the next step).

Upvotes: 2

Related Questions