Blind Gibbon
Blind Gibbon

Reputation: 11

Python 3: Basic Spell Check

#SPELL CHECK PROGRAM
def SpellCheck():

    import difflib
    import urllib.request

    #Downloads 10000 most used english words as txt file
    url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt'
    response = urllib.request.urlopen(url)
    data = response.read()      #
    text = data.decode('utf-8') #

    x = True
    while x == True:
        print("Enter your word: ")
        word = input(">>  ")

        clr(2)
        s(3)

        if (word.lower()) in text:
            print(word, " is spelt correctly")

        else:
            print("It doesn't seem like i know that word")

            #closewords = difflib.get_close_matches(word, text)
            wrongcount = 0
            correctcount = 0
            closewords = []

            ############

            for x in text:
                wrongcount = 0
                correctcount = 0

                for let in x:
                    for letter in word:
                        if let == letter:
                            correctcount += 1
                        else:
                            wrongcount +=1

                    if correctcount > len(word) - 4:
                        closewords.append(x)

                    else:
                        x = 0

            #############

            print("Perhaps you meant one of these: ")
            print( closewords )

        print("Would you like to try again?")
        again = input(">>  ")

        if again == 'Y' or again ==  'y' or again ==  'yes' or again ==  'Yes':
            x = True
        else:
            print("Okay")
            x = False

It Should take a list of the 10,000 most used english words and turn them into a list. if the user's word doesn't match one of the ones in the list it should: for every letter in every word check to see if the letter matches that of the one in the user's word. use this to see if any/many of the letters match in the two words. If they do, print it in a list of suggestions.

If the inputted word is spelt correctly; it will print that it is spelt correctly

If the inputed word is spelt incorrectly; it will print that it is spelt wrong, but it wont give any suggestions, even if for example; (I input 'pooison', it wont come up with anything, even though 'poison' is in the dictionary)

It should in that case print out 'poison'

Upvotes: 0

Views: 1576

Answers (1)

user3684792
user3684792

Reputation: 2611

For the suggestions - don't use your approach. A potentially more rigorous way of suggesting close words is using a smith waterman type alignment algorithm.

The basic idea is that you declare penalties for letter mismatches and insertions in one of the words. You can then, simply select the works in the dictionary that have a matching score that is greater than a certain threshold. The algorithm for words of length n and m is O(mn), so should be pretty reasonable for 10k short words.

Implementation-wise, this algorithm is used mostly for genetic sequence approximate matching, so most of the python implementations are geared towards this. If you can't find a general implementation, it might be worthwhile writing your own as a learning exercise. I can give you some pointers if you want?

Example

import numpy as np
from collections import defaultdict

def sub_function(letter1, letter2):
    
    if letter1 == letter2:
        return 1 #increase score if letter match
    else:
        return -1 # decrease score if letter mismatch

        
#######################################################        


def needleman_wunsch(si,sj,d=-2):
    #dimensions
    I =len(si)+1 ; J = len(sj)+1
        
    #define a dynamic programming matrix+backpointer matrix as a numpy array
    DP=np.zeros([len(si)+1,len(sj)+1]); PTR = DP.copy()

    #initialise top and left edges with the ga[ penalties
    for i in range(0,DP.shape[0]):
        DP[i,0]=d*(i)
    for i in range(0,DP.shape[1]):
        DP[0,i]=d*(i)
    
    #iterations over DP matrix, generate PTR matrix to all reconstruction 
    for i in range(1,I):
        for j in range(1,J):
            F_ij =[DP[i,j-1]+d,DP[i-1,j]+d,DP[i-1,j-1]+sub_function(si[i-1],sj[j-1])]
            DP[i,j]=max(F_ij)
            PTR[i,j]=F_ij.index(DP[i,j])
    
    #reconstructed strings 
    sI='';sJ=''
    l_c = [I-1,J-1]; p_c=PTR[l_c[0],l_c[1]]
    
    #main loop
    while l_c[0] >0 and l_c[1] >0:
        i=l_c[0]-1; j=l_c[1]-1 # character indices
        if PTR[l_c[0],l_c[1]] ==2:
            sI+=si[i]; sJ+=sj[j];
            l_c=[l_c[0]-1,l_c[1]-1] 
        elif PTR[l_c[0],l_c[1]] ==1:
            l_c=[l_c[0]-1,l_c[1]]
            sI+=si[i]; sJ+='-';
        elif PTR[l_c[0],l_c[1]] ==0:
            l_c=[l_c[0],l_c[1]-1]
            sI+='-'; sJ+=sj[j];
        
    #reversing strings as loop builds them backwards
    sI=sI[::-1]; sJ=sJ[::-1]
   
    return (sI,sJ,DP[-1,-1]) # a tuple of the two string+ score
    
def display(nw_tuple):
    print nw_tuple[0]
    print nw_tuple[1]
    print 'score: '+str(nw_tuple[2])
    

match= needleman_wunsch('acknowledgment','acknowlefdgment')
display(match)

Upvotes: 1

Related Questions