Reputation: 2848
I have a list of strings that I need to check against an English dictionary. However I don't want to start checking every piece of gibberish in the list. First, I want to check if the string could be an English word.
Does anyone know of an algorithm that does this or at least the rules that I need to apply to verify a word?
For example:
No spoken word can start with more than 3 consonants, and if there are are 3 initial consonants in a word, the first one must be "s".
Upvotes: 4
Views: 2024
Reputation: 49816
This sort of task is what computers are for. Use some kind of set structure (maybe a bloom filter) to store all the words in a dictionary, and just check your word against it. That's a constant time operation.
Upvotes: 0
Reputation: 44696
Finding a word in a data structure is going to be fast (e.g. use a Bloom filter (mind the false positives!), or a set) so chances are it is not worth doing this for efficiency reasons.
If you want to provide suggestions, then look at Peter Norvig's spell checking implementation.
If you really want to go that way, then I'd construct frequencies of A follows B from existing text to see whether any given sequence is contained within English words.
Upvotes: 4