Reputation: 11
#SPELL CHECK PROGRAM
def SpellCheck():
import difflib
import urllib.request
#Downloads 10000 most used english words as txt file
url = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english.txt'
response = urllib.request.urlopen(url)
data = response.read() #
text = data.decode('utf-8') #
x = True
while x == True:
print("Enter your word: ")
word = input(">> ")
clr(2)
s(3)
if (word.lower()) in text:
print(word, " is spelt correctly")
else:
print("It doesn't seem like i know that word")
#closewords = difflib.get_close_matches(word, text)
wrongcount = 0
correctcount = 0
closewords = []
############
for x in text:
wrongcount = 0
correctcount = 0
for let in x:
for letter in word:
if let == letter:
correctcount += 1
else:
wrongcount +=1
if correctcount > len(word) - 4:
closewords.append(x)
else:
x = 0
#############
print("Perhaps you meant one of these: ")
print( closewords )
print("Would you like to try again?")
again = input(">> ")
if again == 'Y' or again == 'y' or again == 'yes' or again == 'Yes':
x = True
else:
print("Okay")
x = False
It Should take a list of the 10,000 most used english words and turn them into a list. if the user's word doesn't match one of the ones in the list it should: for every letter in every word check to see if the letter matches that of the one in the user's word. use this to see if any/many of the letters match in the two words. If they do, print it in a list of suggestions.
If the inputted word is spelt correctly; it will print that it is spelt correctly
If the inputed word is spelt incorrectly; it will print that it is spelt wrong, but it wont give any suggestions, even if for example; (I input 'pooison', it wont come up with anything, even though 'poison' is in the dictionary)
It should in that case print out 'poison'
Upvotes: 0
Views: 1576
Reputation: 2611
For the suggestions - don't use your approach. A potentially more rigorous way of suggesting close words is using a smith waterman type alignment algorithm.
The basic idea is that you declare penalties for letter mismatches and insertions in one of the words. You can then, simply select the works in the dictionary that have a matching score that is greater than a certain threshold. The algorithm for words of length n and m is O(mn), so should be pretty reasonable for 10k short words.
Implementation-wise, this algorithm is used mostly for genetic sequence approximate matching, so most of the python implementations are geared towards this. If you can't find a general implementation, it might be worthwhile writing your own as a learning exercise. I can give you some pointers if you want?
Example
import numpy as np
from collections import defaultdict
def sub_function(letter1, letter2):
if letter1 == letter2:
return 1 #increase score if letter match
else:
return -1 # decrease score if letter mismatch
#######################################################
def needleman_wunsch(si,sj,d=-2):
#dimensions
I =len(si)+1 ; J = len(sj)+1
#define a dynamic programming matrix+backpointer matrix as a numpy array
DP=np.zeros([len(si)+1,len(sj)+1]); PTR = DP.copy()
#initialise top and left edges with the ga[ penalties
for i in range(0,DP.shape[0]):
DP[i,0]=d*(i)
for i in range(0,DP.shape[1]):
DP[0,i]=d*(i)
#iterations over DP matrix, generate PTR matrix to all reconstruction
for i in range(1,I):
for j in range(1,J):
F_ij =[DP[i,j-1]+d,DP[i-1,j]+d,DP[i-1,j-1]+sub_function(si[i-1],sj[j-1])]
DP[i,j]=max(F_ij)
PTR[i,j]=F_ij.index(DP[i,j])
#reconstructed strings
sI='';sJ=''
l_c = [I-1,J-1]; p_c=PTR[l_c[0],l_c[1]]
#main loop
while l_c[0] >0 and l_c[1] >0:
i=l_c[0]-1; j=l_c[1]-1 # character indices
if PTR[l_c[0],l_c[1]] ==2:
sI+=si[i]; sJ+=sj[j];
l_c=[l_c[0]-1,l_c[1]-1]
elif PTR[l_c[0],l_c[1]] ==1:
l_c=[l_c[0]-1,l_c[1]]
sI+=si[i]; sJ+='-';
elif PTR[l_c[0],l_c[1]] ==0:
l_c=[l_c[0],l_c[1]-1]
sI+='-'; sJ+=sj[j];
#reversing strings as loop builds them backwards
sI=sI[::-1]; sJ=sJ[::-1]
return (sI,sJ,DP[-1,-1]) # a tuple of the two string+ score
def display(nw_tuple):
print nw_tuple[0]
print nw_tuple[1]
print 'score: '+str(nw_tuple[2])
match= needleman_wunsch('acknowledgment','acknowlefdgment')
display(match)
Upvotes: 1