Gregory William Bryant
Gregory William Bryant

Reputation: 539

What is the fastest (performance) way to compare if a string is present in a large list of objects?

Currently I have object which contains two strings:

class myClass
{
    public string string1 { get; set; }
    public string string2 { get; set; }

    public bool MatcheString1(string newString)
    {
        if (this.string1 == newString)
        {
            return true;
        }
        return false;
    }
}

I then have a second class that makes a list of the aforementioned object using List.

class URLs : IEnumerator, IEnumerable
{
    private List<myClass> myCustomList;
    private int position = -1;

    //  Constructor
    public URLs()
    {
        myCustomList = new List<myClass>();
    }
}

In that class I’m using a method to check if a string is present in the list

//  We can also check if the URL string is present in the collection
public bool ContainsString1(string newString)
{
    foreach (myClass entry in myCustomList)
    {
        if (entry. MatcheString1(newString))
        {
            return true;
        }
    }

    return false;
}

Essentially, as the list of objects grows to the 100,000 mark, this process becomes very slow. What is fast way to checking if that string is present? I’m happy to create a List outside of the class to validation, but that seems hacky to me?

Upvotes: 3

Views: 147

Answers (3)

sh1ng
sh1ng

Reputation: 2973

Search over sorted array(list) takes O(logN)

        var sortedList = new SortedSet<string>();
        sortedList.Add("abc");
        // and so on
        sortedList.Contains("test");

Search over HashSet takes O(1), but I guess in case of 100k elements(Log(100000)=5), and almost no difference to the HashSet that takes more memory.

Upvotes: 0

dcastro
dcastro

Reputation: 68710

If you don't mind using a different data structure, instead of a list, you could a dictionary where your objects are indexed by their string1 property.

public URLs()
{
    myDictionary = new Dictionary<string, myClass>();
}

Since Dictionary<TKey, TValue> can usually find elements in O(1) time, you can perform that check very fast.

if(myDictionary.ContainsKey(newString))
  //...

Upvotes: 2

Marc Gravell
Marc Gravell

Reputation: 1063338

Once the list of items is stable, you can compute a hash-set of the matches, for example:

// up-front work
var knownStrings = new HashSet<string>();
foreach(var item in myCustomList) knownStrings.Add(item.string1);

(note that this is not free, and will need to be re-computed as the list changes); then, later, you can just check:

return knownStrings.Contains(newString);

which is then very cheap (O(1) instead of O(N)).

Upvotes: 5

Related Questions