Kristine Sarah Tan
Kristine Sarah Tan

Reputation: 99

check inputted string from a file contains of allowable words

I'm starting to write a program here to check the inputted word/s by user whether correct or not then the program will have the capability to correct it from point to point letter/s by letter/s. Able to move letter by this point to that point just to correct the word that depends on the list of words from a .txt file.

e.g. input:

"tihs is nto a corerct sentnece" (this is not a correct sentence)

If the user has inputted a wrong word/s the program will scan the .txt file then find the most near correct word just to correct the wrong inputted word then the program has the capability to correct it and output the correct sentence like:

"this is not a correct sentence" from (tihs is nto a corerct sentnece)

Every incorrect word/s will be scanned based on the .txt file.

My question is, how am I going to start coding for this stuff? thanks...

Upvotes: 2

Views: 323

Answers (2)

Mike Samuel
Mike Samuel

Reputation: 120526

From "How to write a spelling corrector" by Peter Norvig:

The full details of an industrial-strength spell corrector like Google's would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

Peter Norvig is a very talented computer scientist, and a great explainer, so I highly recommend his blog.

Upvotes: 3

Duncan
Duncan

Reputation: 990

First thing, you obviously need to find words spelled incorrectly. Next, you should determine a way of choosing a value for words that are possibly correct. I.e. "folor" could be "floor" with jumbled letters or "color" with a 'f' as opposed to 'c' and so on. In this case, both words are really close: two mixed up letters and a character replacing another character close to it on the keyboard. You would have to assign each of these values based off what you think is a more common mistake. In general, you could put each word with a low value into a Priority Queue and then pull from there. However, if the only case is the one described (swapped letters) then it is a little easier in terms of your sample size, but you would still have to assign a value to each word.

Note: nto could also be fixed to ton. If you wish to get rid of this possibility, you would have to check grammar as well.

Upvotes: 2

Related Questions