Reputation: 105
case 15: {
for (int i = 0; i < words.Count; i++) {
if (words[i].Length == 8) {
var tupled = words[i].ConcatCheck();
for (int n = 0; n < word.Count; n++)
if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
temp++;
}
if (temp >= 2)
matches.Add(words[i]);
temp = 0;
}
break;
}
What it does:
The first 'for loop' iterates through a List
of words about 248000 elements long, checking for words of length 8.
When one is found, I create a Tuple
of the first and last half of the word (4 letters per half) by calling the ConcatCheck()
method (an Extension method I wrote for obj String). That part is fast and fine.
What really needs work is the second 'for loop'. Every single 8 letter word activates this loop, which iterates over an even larger List
of about 267000 elements, checking to see if both Items of the Tuple
exist. If both are found, I have the original word added to a list "matches".
This part takes almost 3 minutes to find all matches in the 248k dictionary I have. Any way to optimize/speed it up?
Upvotes: 2
Views: 286
Reputation: 6643
O(n) using dictionary:
IList<string> words1 = new List<string>{...};
var wordsWithLengthOf8 = words1.Where(w => w.Length == 8).ToList();
IDictionary<string,string> wordsWithLengthOf8Dic = wordsWithLengthOf8.ToDictionary(w => w);
IList<string> words2 = new List<string>{...};
IList<string> matches = new List<string>();
for (int i = 0; i < wordsWithLengthOf8.Count; i++)
{
var tupled = wordsWithLengthOf8[i].ConcatCheck();
var isMatch = wordsWithLengthOf8Dic.ContainsKey(tupled.Item1) && wordsWithLengthOf8Dic.ContainsKey(tupled.Item2);
if (isMatch)
{
matches.Add(wordsWithLengthOf8[i]);
}
}
Upvotes: -1
Reputation: 13394
If you simply want to check if a word exists in a collection, use a HashSet
instead of a List
or Array
. The HashSet
class is optimized for Contains
checks.
Example
With the follwing code I found all 8 letter words made up of two 4 letter words in the english dictionary (github version) in under 50 ms.
WebClient client = new WebClient();
string dictionary = client.DownloadString(
@"https://raw.githubusercontent.com/dwyl/english-words/master/words.txt");
Stopwatch watch = new Stopwatch();
watch.Start();
HashSet<string> fourLetterWords = new HashSet<string>();
using (StringReader reader = new StringReader(dictionary))
{
while (true)
{
string line = reader.ReadLine();
if (line == null) break;
if (line.Length != 4) continue;
fourLetterWords.Add(line);
}
}
List<string> matches = new List<string>();
using (StringReader reader = new StringReader(dictionary))
{
while (true)
{
string line = reader.ReadLine();
if (line == null) break;
if (line.Length != 8) continue;
if (fourLetterWords.Contains(line.Substring(0, 4)) &&
fourLetterWords.Contains(line.Substring(4, 4)))
matches.Add(line);
}
}
watch.Stop();
Why is your code so slow?
for (int n = 0; n < word.Count; n++)
if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
temp++;
This part is one of the culprits. Instead of checking Are both parts contained in my array?
you are checking Are 2 or more of my 2 words contained in an array?
.
You could optimize this part by breaking the loop once you found both words.
if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
if(++temp >= 2) break;
Further optimizations could be made (depending on how often you run this search) by pre-sorting your words by length or alphabetically.
Upvotes: 3