spell checker with a twist

Question

Have a quick question with regards to a spell checker but with a twist. effectively, its more vague than you're regular spell checker in the sense that it rather than correcting your words, it judges how correct you are based on how close one gets to the words. For instance, if one string is different from another based on two characters or less or less e.g. "hello and hallo", it will state "nearly there". Here is the code written below that I attempted.

def spell_checker(correct, guess):
    if guess==correct:
        print("Correct")
    if guess!=correct:
        for g in guess:
        for f in correct:
            if g!=f:
                print("nearly there")
            else:
                print("Wrong")

Obviously I realise this is quite a crude answer since it does not talk about the range of mistakes but to be honest, I could not find a way of incorporating the range of mistakes in word. Even when I looked at the response to nltk's answer, I did not know where to start.

The output for the answer when applying the "hello, hallo" example was as follows

Wrong almost almost almost almost almost almost almost almost almost almost almost Wrong Wrong almost almost almost Wrong Wrong almost almost almost almost almost Wrong

I believe its almost going through each character and stating whether one character is similar to the other. Would really appreciate any help on this

tobias_k · Accepted Answer

The problem with your code is that you are comparing every character in the first word with every other character in the other word. If you want to compare just characters in the same position, a very very simple way would be to zip the two words and count mismatched characters:

>>> a, b = "hello", "hallo"
>>> sum(x != y for x, y in zip(a, b))
1

But this will of course fail if the words do not have the same length. Also, it does not work well with missing or superfluous characters:

>>> a, b = "correct", "corect"
>>> sum(x != y for x, y in zip(a, b))
3

A better approach would be to calculate the edit distance between the two strings. If you do not want to implement the algorithm yourself, you could e.g. use difflib.ndiff:

>>> list(difflib.ndiff(a, b))
['  c', '  o', '- r', '  r', '  e', '  c', '  t']
>>> sum(d[0] != " " for d in difflib.ndiff(a, b))
1

Note, however, that this will count replacements twice: Once for the deleted char, and once for the inserted char. You could fix this by e.g. not adding 1 if you get a + followed by a - or vice versa, which is left as an exercise to the interested reader.

Any way, just count the number of mismatched characters, and print "almost" if that number is small enough.

def spell_checker(correct, guess):
    if guess==correct:
        print("correct")
    elif sum(d[0] != " " for d in difflib.ndiff(correct, guess)) <= 2:
        print("almost")
    else:
        print("wrong")

spell checker with a twist

Answers (1)

Related Questions