CocoaMix86
CocoaMix86

Reputation: 105

Iterating over multi-thousand element List

case 15: {
    for (int i = 0; i < words.Count; i++) {
        if (words[i].Length == 8) {
            var tupled = words[i].ConcatCheck();
            for (int n = 0; n < word.Count; n++)
                if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
                    temp++;
        }
        if (temp >= 2)
            matches.Add(words[i]);
        temp = 0;
    }
    break;
}

What it does:
The first 'for loop' iterates through a List of words about 248000 elements long, checking for words of length 8. When one is found, I create a Tuple of the first and last half of the word (4 letters per half) by calling the ConcatCheck() method (an Extension method I wrote for obj String). That part is fast and fine.

What really needs work is the second 'for loop'. Every single 8 letter word activates this loop, which iterates over an even larger List of about 267000 elements, checking to see if both Items of the Tuple exist. If both are found, I have the original word added to a list "matches".

This part takes almost 3 minutes to find all matches in the 248k dictionary I have. Any way to optimize/speed it up?

Upvotes: 2

Views: 286

Answers (2)

ozba
ozba

Reputation: 6643

O(n) using dictionary:

            IList<string> words1 = new List<string>{...};
            var wordsWithLengthOf8 = words1.Where(w => w.Length == 8).ToList();
            IDictionary<string,string> wordsWithLengthOf8Dic = wordsWithLengthOf8.ToDictionary(w => w);
            IList<string> words2 = new List<string>{...};
            IList<string> matches = new List<string>();   

            for (int i = 0; i < wordsWithLengthOf8.Count; i++)
            {
                var tupled = wordsWithLengthOf8[i].ConcatCheck();
                var isMatch = wordsWithLengthOf8Dic.ContainsKey(tupled.Item1) && wordsWithLengthOf8Dic.ContainsKey(tupled.Item2);
                if (isMatch)
                {
                    matches.Add(wordsWithLengthOf8[i]);
                }
            }

Upvotes: -1

Manfred Radlwimmer
Manfred Radlwimmer

Reputation: 13394

If you simply want to check if a word exists in a collection, use a HashSet instead of a List or Array. The HashSet class is optimized for Contains checks.

Example

With the follwing code I found all 8 letter words made up of two 4 letter words in the english dictionary (github version) in under 50 ms.

WebClient client = new WebClient();
string dictionary = client.DownloadString(
    @"https://raw.githubusercontent.com/dwyl/english-words/master/words.txt");

Stopwatch watch = new Stopwatch();
watch.Start();

HashSet<string> fourLetterWords = new HashSet<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 4) continue;

        fourLetterWords.Add(line);
    }
}

List<string> matches = new List<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 8) continue;

        if (fourLetterWords.Contains(line.Substring(0, 4)) &&
            fourLetterWords.Contains(line.Substring(4, 4)))
            matches.Add(line);
    }
}

watch.Stop();    

Why is your code so slow?

for (int n = 0; n < word.Count; n++)
    if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
        temp++;

This part is one of the culprits. Instead of checking Are both parts contained in my array? you are checking Are 2 or more of my 2 words contained in an array?.

You could optimize this part by breaking the loop once you found both words.

if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
    if(++temp >= 2) break;         

Further optimizations could be made (depending on how often you run this search) by pre-sorting your words by length or alphabetically.

Upvotes: 3

Related Questions