Reputation: 25919
I have a list of words describing actions, for example:
New
Open
Save
Save as
Copy
Paste
Cut
Select all
etc.
I'd like the user to be able to find commands by entering only a couple letters in succession. So if user entered, for instance, "ae", he should receive:
sAvE
sAvE as
pAstE
In general, when user enters "abc", I want to return all strings matching regular expression .*a.*b.*c.*
. Since verifying whether string matchest this expression is linear and bruteforce algorithm is also linear, regular expressions won't help much in optimizing the search.
Important thing about this list is that it is known by compile-time, so I may design a data structure, which will hold all those terms to speed up the search.
Is there a data structure or an algorithm, which would speed up finding all matching words for specific user entry beyond O(m*n) (where m is count of terms and n - average term length)?
Upvotes: 1
Views: 1298
Reputation: 134075
This sounds like a case of premature optimization to me. Ridiculously premature. Even if you have 70 commands instead of just seven, the amount of time it will take to do a sequential search of all your commands is so small that your user won't notice it. And it's not like this is a function that you'll be calling hundreds or thousands of times per second. So spending hours implementing a fancy search to save a few milliseconds here and there is just wasted time. It's likely that the amount of time your users save over the life of the program won't even come close to the amount of time you'll spend designing, writing, and debugging your optimized solution.
You have a small number of very short commands. Computers are fast. There is no problem to solve here. Go spend your time on features that will actually benefit the user.
Now, if you had a very large number (tens of thousands) of strings to search, then you might benefit from some optimization. In that case . . .
You could start by making a dictionary that's keyed by letter, and the value is a list of all words that contain that letter. So your example would be something like:
a, [Save, Save as, Paste]
c, [Copy, Cut, Select all]
e, [New, Open, Save, Save as, Paste, Select all]
n, [New, Open]
... etc.
Then, make entries in the dictionary for "letter follows letter". This is going to get large in a hurry. "Paste," for example will have entries:
pa ps pt pe as at ae st se te
You can continue making those keys for longer substrings. For example, you get:
pas pat pae pst pse pte
This can be very effective when the strings are short. It becomes less effective when the strings get longer because the likelihood of a string containing a particular letter combination increases as the string length increases.
You could probably save some space by creating a trie, but the technique is essentially the same.
Also potentially useful: suffix tree and generalized suffix tree.
Upvotes: 3
Reputation: 27743
I don't know much about its time complexity, yet if we just wish to search for words that must have a
and e
in it, we would be starting with an expression similar to the following, then we would optimize for runtime:
(?=.*a)(?=.*e).*
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(?=.*a)(?=.*e).*";
string input = @"New
Open
Save
Save as
Copy
Paste
Cut
Select all";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Upvotes: 0
Reputation: 481
As a wrote in my comment, probably you shouldn't worry about performance too much...
However, this data structure should fit your needs, let's call it a: Letter X before Y tree. Basically, a tree where each node has children for each letter if the letter appears after the parent nodes letter in the target word For each node you would store a list of all the matches.
public class Node
{
public char Letter { get; set; }
public Node[] Children {get;set;}
public List<string> CommandNames {get;set;}
}
You can pre-fill it with all your known names like this
p paste
a paste
s paste
..
p/a paste
p/s paste
p/t paste
..
t/e paste
..
p/a/s paste
..
s/t/e paste
Matching your search word simply means traversing your tree for each letter entered by the user, so it boils down to O(n)
Upvotes: 1