Reputation: 15
I have the following code which spellchecks words using a binary search. It compares the file to be spell checked to a file which is a dictionary and returns all mispelled words.
The spellchecker worked when i printed the misspelled words to the terminal but now im writing it to a file it is only finding a fraction of the words.
Iv also implemented a timer to time the search
import re
import time
start_time = time.time()
f1=open('writefile.txt', 'w+')
def binS(lo,hi,target):
if (lo>=hi):
return False
mid = (lo+hi) // 2
piv = words[mid]
if piv==target:
return True
if piv<target:
return binS(mid+1,hi,target)
return binS(lo,mid,target)
words = [s.strip("\n").lower() for s in open("words10k.txt")]
words.sort() # sort the list
text = open("shakespeare.txt" , encoding="utf8")
content = text.read().split(" ")
content = [item.lower() for item in content]
content = ' '.join(content)
content = re.findall("[a-z]+", content)
for w in content:
if not binS(0,len(words),w):
f1.write(w)
print("--- %s seconds ---" % (time.time() - start_time))
I had this segment of code before which worked by printing to the terminal. (also how could I write 1 word per line in the write out file)
for w in content:
if not binS(0,len(words),w):
print(w)
Search time by printing to the terminal : 2000 seconds
Search time by writing to a file : 38 seconds
Upvotes: 1
Views: 1562
Reputation: 72241
I can't see where you're closing the file after opening it. Writes to files are buffered, so that can be a reason.
A more proper way would be to use the with
statement to close the file properly when you're done writing:
with open('writefile.txt', 'w+') as f1:
for w in content:
if not binS(0,len(words),w):
f1.write(w)
In other news:
set
to store words
, so that you do efficient lookups: if w not in words: ...
f1.writelines
and a generator expressionUpvotes: 2