Reputation: 151
I'm trying to make a word puzzle game, and for that I'm using a recursive method to find all possible words in the given letters. The letters is in a 4x4 board.
Like this:
ABCD
EFGH
HIJK
LMNO
The recursive method is called inside this loop:
for (int y = 0; y < width; y++)
{
for (int x = 0; x < height; x++)
{
myScabble.Search(letters, y, x, width, height, "", covered, t);
}
}
letters is a 2D array of chars.
y & x are ints that shows where in the board
width & height is also int, that tells the dimensions of the board
"" is the string we are trying to make (the word)
covered is an array of bools, to check if we already used that square.
t is a List (which contains all the words to check against).
The recursive method that need optimizing:
public void Search(char[,] letters, int y, int x, int width, int height, string build, bool[,] covered, List<aWord> tt)
{
// Dont get outside the bounds
if (y >= width || y < 0 || x >= height || x < 0)
{
return;
}
// Dont deal with allrady covered squares
if (covered[x, y])
{
return;
}
// Get Letter
char letter = letters[x, y];
// Append
string pass = build + letter;
// check if its a possible word
//List<aWord> t = myWords.aWord.Where(w => w.word.StartsWith(pass)).ToList();
List<aWord> t = tt.Where(w => w.word.StartsWith(pass)).ToList();
// check if the list is emphty
if (t.Count < 10 && t.Count != 0)
{
//stop point
}
if (t.Count == 0)
{
return;
}
// Check if its a complete word.
if (t[0].word == pass)
{
//check if its allrdy present in the _found dictinary
if (!_found.ContainsKey(pass))
{
//if not add the word to the dictionary
_found.Add(pass, true);
}
}
// Check to see if there is more than 1 more that matches string pass
// ie. are there more words to find.
if (t.Count > 1)
{
// make a copy of the covered array
bool[,] cov = new bool[height, width];
for (int i = 0; i < width; i++)
{
for (int a = 0; a < height; a++)
{
cov[a, i] = covered[a, i];
}
}
// Set the current square as covered.
cov[x, y] = true;
// Continue in all 8 directions.
Search(letters, y + 1, x, width, height, pass, cov, t);
Search(letters, y, x + 1, width, height, pass, cov, t);
Search(letters, y + 1, x + 1, width, height, pass, cov, t);
Search(letters, y - 1, x, width, height, pass, cov, t);
Search(letters, y, x - 1, width, height, pass, cov, t);
Search(letters, y - 1, x - 1, width, height, pass, cov, t);
Search(letters, y - 1, x + 1, width, height, pass, cov, t);
Search(letters, y + 1, x - 1, width, height, pass, cov, t);
}
}
The code works as I expected it to do, however it is very slow.. it takes about 2 mins to find the words.
EDIT: I clarified that the letters array is 2D
I want to show how I got the algorithm to work really fast. I used @Enigmativity 's implementation of a Trie, with the search pattern described by @EricLippert
public void SearchWord(char[,] letters, Trie parentTrie, char[] build, int x, int y, bool[,] covered )
{
char[] pass = new char[build.Length + 1];
build.CopyTo(pass, 0);
// iterate through all squares in the board.
for (var r = 0; r < letters.GetLength(0); r++ )
{
for (var c = 0; c < letters.GetLength(1); c++)
{
//check if this square is naighbor to the last square
if ((IsNeighbor(x, y, r, c)|| x == -1) && !(covered[r, c]))
{
// check if the current Trie contains the letter
if (parentTrie.ContainsKey(letters[r,c]))
{
pass[build.Length] = letters[r, c];
covered[r, c] = true;
SearchWord(letters, parentTrie[letters[r, c]], pass, r, c, covered);
covered[r, c] = false;
}
if (parentTrie.ContainsKey('$') && (!myStrings.Contains(new string(build).ToLower())))
myStrings.Add(new string(build).ToLower());
}
}
}
}
It is initially called by this:
SearchWord(letters, trie, new char[0], -1, -1, new bool[letters.GetLength(0), letters.GetLength(1)]);
I realize that I could add letters as a property, but since its a reference type it is not very time expensive
Upvotes: 2
Views: 1597
Reputation: 3894
One optimization would be in your directional search. You only need to search down, left, and down-left diagonally as searching up, right, and up-right diagonally will give you the same result as simply reversing the string.
EDIT:
Of interest also might be Project Euler Problem #11 and a writeup of a solution using C# and LINQ here.
This is a completely untested example of what I was trying to describe.
static List<string> Solve()
{
// Declaring a empty list of strings to hold our results up front.
List<string> words = new List<string>();
// I'm using set as the term for your grid of letters.
string set = @"ABCD
EFGH
HIJK
LMNO";
// i'm explicitly defining the dimensions, you need to figure this out.
int sizeX = 4;
int sizeY = 4;
// i'm also specifying a maximum word length. you might find a word like
// supercalifragilisticexpialidocious, but i doubt it so lets not waste time.
int maximumWordSize = 3;
// first, our trie/wordlist/etc. assume `GetWordList()` gets a list of all
// valid words with indicated number of characters.
List<string> wordList = GetWordList(3);
// second, we load a character array with the set.
char[,] data = new char[sizeX, sizeY];
string[] lines = set.Split('\n');
for (int i = 0; i <= lines.Count() -1; i++)
{
string line = lines[i].Trim();
for (int j = 0; j <= line.Length - 1; j++)
{
char[j,i] = lines[j];
}
}
// third, we iterate over the data
for(int x = 0; x <= sizeX - maximumWordSize; x++)
{
for (int y = 0; y <= sizeY - maximumWordSize; y++)
{
// check to see if we even have any words starting with our cursor
var validWords = wordList.Where(w=>w.Contains(data[x,y]));
if (validWords.Count() > 0)
{
// ok, we have words. continue on our quest!
// (this is where your initial qualifier changes if you use a trie
// or other search method)
char[] characters = char[maximumWordSize];
// search left
for (int i = x; i <= x + maximumWordSize - 1; i++)
characters[i] = data[i, y];
words.AddRange(Search(characters, wordList));
// search down
for (int i = y + maximumWordSize - 1; i <= y; i--)
characters[i] = data[x, y];
words.AddRange(Search(characters, wordList));
// search diagonal right
for (int i = x; i <= x + maximumWordSize - 1; i++)
for (int j = y + maximumWordSize - 1; j <= y; j--)
characters[i] = data[i, j];
words.AddRange(Search(characters, wordList));
// search diagonal left
for (int i = x; i <= x - MaximumWordSize + 1; i++)
for (int j = y + maximumWordSize - 1; j <= y; j--)
characters[i] = data[i, j];
words.AddRange(Search(characters, wordList));
}
}
}
return words;
}
static List<string> Search(char[] Input, List<string> WordList)
{
List<string> result = new List<string>();
string word = "";
// find forwards
for (int i = 0; i <= Input.Length - 1; i++)
{
word += Input[i];
if (WordList.Contains(word))
result.Add(word);
}
// find backwards
Array.Reverse(Input);
for (int i = 0; i <= Input.Length - 1; i++)
{
word += Input[i];
if (WordList.Contains(word))
result.Add(word);
}
return result;
}
Upvotes: 2
Reputation: 117144
I thought I might put forward code based on Eric Lippert's answer. Eric nailed it, but hard code is always better. ;-)
To start with, here is a simple implementation of a trie:
public class Trie : Dictionary<char, Trie>
{
public int Frequency { get; set; }
public void Add(IEnumerable<char> chars)
{
if (chars == null) throw new System.ArgumentNullException("chars");
if (chars.Any())
{
var head = chars.First();
if (!this.ContainsKey(head))
{
this.Add(head, new Trie());
}
var tail = this.GetSafeTail(head, chars.Skip(1));
if (tail.Any())
{
this[head].Add(tail);
}
}
}
public bool Contains(IEnumerable<char> chars)
{
if (chars == null) throw new System.ArgumentNullException("chars");
var @return = false;
if (chars.Any())
{
var head = chars.First();
if (this.ContainsKey(head))
{
var tail = this.GetSafeTail(head, chars.Skip(1));
@return = tail.Any() ? this[head].Contains(tail) : true;
}
}
return @return;
}
private IEnumerable<char> GetSafeTail(char head, IEnumerable<char> tail)
{
return ((!tail.Any()) && (head != '$')) ? new [] { '$', } : tail;
}
}
This code allows you to pass strings to the Add
& Contains
methods as strings are IEnumerable<char>
.
It can be used like so:
var trie = new Trie();
var before = trie.Contains("Hello"); // == false
trie.Add("Hello");
var after = trie.Contains("Hello"); // == true
Now, given a grid of letters and a trie loaded up with the possible words I can run the following query to get the matches:
var matches =
from w in this.GetPossibleWords(letters)
where trie.Contains(w)
select w;
The GetPossibleWords
method is implemented as such:
public IEnumerable<string> GetPossibleWords(char[,] letters)
{
return
from ws in this.GetPossibleWordLists(letters)
from w in ws
select w;
}
private IEnumerable<IEnumerable<string>> GetPossibleWordLists(char[,] letters)
{
Func<int, int> inc = x => x + 1;
Func<int, int> nop = x => x;
Func<int, int> dec = x => x - 1;
for (var r = letters.GetLowerBound(0);
r <= letters.GetUpperBound(0); r++)
{
for (var c = letters.GetLowerBound(1);
c <= letters.GetUpperBound(1); c++)
{
yield return new [] { letters[r, c].ToString(), };
yield return this.GetPossibleWords(letters, r, c, dec, dec);
yield return this.GetPossibleWords(letters, r, c, inc, dec);
yield return this.GetPossibleWords(letters, r, c, nop, dec);
yield return this.GetPossibleWords(letters, r, c, dec, nop);
yield return this.GetPossibleWords(letters, r, c, inc, nop);
yield return this.GetPossibleWords(letters, r, c, nop, inc);
yield return this.GetPossibleWords(letters, r, c, dec, inc);
yield return this.GetPossibleWords(letters, r, c, inc, inc);
}
}
}
private IEnumerable<string> GetPossibleWords(char[,] letters,
int r, int c,
Func<int, int> rd, Func<int, int> cd)
{
var chars = new List<char>();
while (r >= letters.GetLowerBound(0) && r <= letters.GetUpperBound(0)
&& c >= letters.GetLowerBound(1) && c <= letters.GetUpperBound(1))
{
chars.Add(letters[r, c]);
if (chars.Count > 1)
{
yield return new string(chars.ToArray());
}
r = rd(r);
c = cd(c);
}
}
The performance of this solution seems quite good.
The OP had a 4x4 grid taking roughly 120 seconds (2 mins). My code can match a list of 208,560 words in a 40x40 grid in 20.3 seconds.
Not bad, huh?
Thanks again to Eric for the idea of using a trie.
Upvotes: 3
Reputation: 30989
I don't know C# too well, but here's how I would do it in C++ (some things are pseudocode to make it easier to read); it should be easy to translate:
struct word_search {
const set<string>& words; // This is a balanced search tree
const vector<vector<char> >& letters;
const int width, height;
vector<vector<bool> > marks;
string word_so_far;
// Add constructor
void search(int x, int y) {
if (x < 0 || x >= width || y < 0 || y >= height || marks[x][y]) return;
word_so_far += letters[x][y];
set<string>::const_iterator it = words.lower_bound(word_so_far);
if (it == words.end() ||
it->substr(0, word_so_far.size()) != word_so_far) {
word_so_far.resize(word_so_far.size() - 1);
return;
}
marks[x][y] = true;
for (int dx = -1; dx <= 1; ++dx)
for (int dy = -1; dy <= 1; ++dy)
search(x + dx, y + dy);
marks[x][y] = false;
word_so_far.resize(word_so_far.size() - 1);
}
};
In C#, set<string>
would be SortedSet
, and the vector<vector<...> >
types would be 2-D arrays. I am not sure of the equivalent to lower_bound
, though; SortedSet
doesn't seem to have anything like it.
Upvotes: 0
Reputation: 660387
The other answers are right: you should abandon this algorithm entirely and start over.
The way to deal with these problems in word games is to transform the dictionary into a form that is amenable to the kinds of searches you want to do. In your case, the data structure you want to build is called a trie, which is a pun because it is a "tree" that does fast "re-trie-val", ha ha ha, those computer science guys are very witty!
The way a trie works is you have a tree where every node has up to 27 children. Suppose you have the dictionary { AB, ACE, ACT, CAT }. The trie looks like this:
root
/ \
A C
/ \ |
B C A
| / \ |
$ E T T
| | |
$ $ $
where $ means "the word is ended". Every path from the root to a $ is a legal word.
Now to find all the words, first build the trie out of the dictionary. Then for each starting point in the grid, try every possible word in the trie. If you start with a grid square that contains an A then of course you do not traverse down the C branch of the trie. Then for each child of the A in the trie, see if the grid has a neighbouring square that can drive you deeper into the trie.
That's the interesting recursive algorithm, and you'll find that it is very fast because you can be very smart about abandoning huge numbers of words that can't possibly exist starting from a given square.
Does that all make sense?
Upvotes: 17
Reputation: 17274
List
to store all the words use some tree-like structure. Or at least a sorted list + binary search. This point should improve performance dramatically. I would recommend using ordered collection and passing range indices through recursice methods. I.e. you will look for first letter and determine that words starting with that letter have 123-456 indices. Next time you append new letter you will make this range even more narrow and you won't need to go through entire collection! ToList
method). You can use it's lazy nature to iterate only until you need since you do not care how many words start with this subword if this number is more than 2. cov[x, y] = true;
and after nested search tries put back false
value: cov[x, y] = false;
Upvotes: 2
Reputation: 14783
It is probably more efficient to attack this in the opposite direction. Iterate over you list of possible words, look for the first letter of the word in the grid, if you find it look for the second in adjacent square, if you find it keep trying to match further along that line.
This way you can quickly eliminate words entirely rather than repeatedly check for every word in every possible position.
Just looking at the code you are using in something that is going to repeat thousands of times LINQ is probably a no-no. You are using string manipulation, so you are generating thousands of strings in memory, and possibly causing the garbage collector to run during the recursion, you should switch to string builder or some sort of character array.
There are loads of optimisations you can do at this point, for example do a quick first pass on your list checking that every letter of each word is somewhere in the grid (just put all letters in the grid into a string to make this easy), then you can quickly eliminate words where not all their letters are in the grid - before even worrying about if they are in the right order.
You could turn your grid into a series of strings, each representing a certain orientation of the grid and then just use plain string searches. If you find a match you would just need to make sure the match wasn't across the boundary of the grid which can be done by checking the start and end point of the match quite quickly.
Upvotes: 4
Reputation: 160952
Another optimization would be data structures:
List<aWord> t = tt.Where(w => w.word.StartsWith(pass)).ToList();
This is O(n) over all valid words, you could improve the performance by using i.e. a trie.
Upvotes: 1
Reputation: 35477
Make the invariant arguments (letter, width, height and tt) fields. Do not use Linq, it is slow over many iterations. Also, change your list of words to a more effecient Dictionary or some type of sorted list.
Upvotes: 4