Kesarion
Kesarion

Reputation: 2848

Algorithm to determine if a word could be English?

I have a list of strings that I need to check against an English dictionary. However I don't want to start checking every piece of gibberish in the list. First, I want to check if the string could be an English word.

Does anyone know of an algorithm that does this or at least the rules that I need to apply to verify a word?

For example:

No spoken word can start with more than 3 consonants, and if there are are 3 initial consonants in a word, the first one must be "s".

Upvotes: 4

Views: 2024

Answers (2)

Marcin
Marcin

Reputation: 49816

This sort of task is what computers are for. Use some kind of set structure (maybe a bloom filter) to store all the words in a dictionary, and just check your word against it. That's a constant time operation.

Upvotes: 0

Jeff Foster
Jeff Foster

Reputation: 44696

Finding a word in a data structure is going to be fast (e.g. use a Bloom filter (mind the false positives!), or a set) so chances are it is not worth doing this for efficiency reasons.

If you want to provide suggestions, then look at Peter Norvig's spell checking implementation.

If you really want to go that way, then I'd construct frequencies of A follows B from existing text to see whether any given sequence is contained within English words.

Upvotes: 4

Related Questions