Reputation: 1186
I'm making a simple command line python program, and validating the user input with a spell checker. I found http://norvig.com/spell-correct.html on SO earlier, and am using it to validate what my user will enter. In my case, I'm validating the user input against a list of BART stations. The user will have to enter the name of the station exactly, or get a suggestion from the spell checker. Here is a list of BART stations that I am validating.
Lake Merritt
Daly City
Fruitvale
Coliseum/Oakland Airport
San Leandro
.
.
.
The difference between what I'm doing and the sample code I found, is that I'm validating against multiple words-"Daly City" and not just "Fruitvale." I'm not very good with regex and python, and I'm trying to figure out how to read each line, and check everything from the beginning of the line until the end. So I'm having trouble figuring out a regex to read everything on one line. Here's the code I'm trying to change:
def words(text): return re.findall('[a-z]+', text.lower())
Where text is the big text file used in the example
I tried
def words(text):
lines=text.split('\n')
return re.search('[a-z]+', lines)
Thinking that would work, since that (to me at least) means I'm searching each line for at least one lower case char in a string. However, I got this back
Traceback (most recent call last):
File "spell.py", line 15, in <module>
NWORDS = train(words(file('stations.txt').read()))
File "spell.py", line 6, in words
return re.search('[a-z]+', lines)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or buffer
I'm not really sure how to do this. Can anyone help?
Upvotes: 1
Views: 808
Reputation: 881037
Perhaps use difflib
instead of Norvig's spelling corrector. difflib
has a get_close_matches function which can help you guess which string among the BART stations is closest to the string the user inputted. For example,
import difflib
bart_stations = ['Lake Merritt', 'Daly City', 'Fruitvale', 'Coliseum/Oakland Airport',
'San Leandro']
while True:
text = raw_input('Enter BART station: ')
if not text: break # Pressing Enter quits
guess = difflib.get_close_matches(text, bart_stations, n=1, cutoff=0)[0]
print('Closest match: {g}'.format(g = guess))
Running the script yields:
% test.py
Enter BART station: Merit
Closest match: Lake Merritt
Enter BART station: Fruity
Closest match: Fruitvale
Enter BART station: Coli
Closest match: Daly City
Enter BART station: Col
Closest match: Coliseum/Oakland Airport
Enter BART station: Lean
Closest match: San Leandro
Enter BART station:
Upvotes: 1