user16168
user16168

Reputation: 645

spell checker uses language model

I look for spell checker that could use language model.

I know there is a lot of good spell checkers such as Hunspell, however as I see it doesn't relate to context, so it only token-based spell checker.

for example,

I lick eating banana

so here at token-based level no misspellings at all, all words are correct, but there is no meaning in the sentence. However "smart" spell checker would recognize that "lick" is actually correctly written word, but may be the author meant "like" and then there is a meaning in the sentence.

I have a bunch of correctly written sentences in the specific domain, I want to train "smart" spell checker to recognize misspelling and to learn language model, such that it would recognize that even thought "lick" is written correctly, however the author meant "like".

I don't see that Hunspell has such feature, can you suggest any other spell checker, that could do so.

Upvotes: 5

Views: 1675

Answers (2)

AaronD
AaronD

Reputation: 1701

One way to do this is via a character-based language model (rather than a word-based n-gram model). See my answer to Figuring out where to add punctuation in bad user generated content?. The problem you're describing is different, but you can apply a similar solution. And, as I noted there, the LingPipe tutorial is a pretty straightforward way of developing a proof-of-concept implementation.

One important difference - to capture more context, you may want to train a larger n-gram model than the one I recommended for punctuation restoration. Maybe 15-30 characters? You'll have to experiment a little there.

Upvotes: 0

Daniel Naber
Daniel Naber

Reputation: 1654

See "The Design of a Proofreading Software Service" by Raphael Mudge. He describes both the data sources (Wikipedia, blogs etc) and the algorithm (basically comparing probabilities) of his approach. The source of this system, After the Deadline, is available, but it's not actively maintained anymore.

Upvotes: 1

Related Questions