Steven M
Steven M

Reputation: 15

Data being cut off when writing to a file in Python

I have the following code which spellchecks words using a binary search. It compares the file to be spell checked to a file which is a dictionary and returns all mispelled words.

The spellchecker worked when i printed the misspelled words to the terminal but now im writing it to a file it is only finding a fraction of the words.

Iv also implemented a timer to time the search

import re
import time

start_time = time.time()
f1=open('writefile.txt', 'w+')

def binS(lo,hi,target):

    if (lo>=hi):
        return False
    mid = (lo+hi) // 2
    piv = words[mid]
    if piv==target:
       return True
    if piv<target:
       return binS(mid+1,hi,target)
    return binS(lo,mid,target)

words = [s.strip("\n").lower() for s in open("words10k.txt")] 
words.sort() # sort the list

text = open("shakespeare.txt" , encoding="utf8")
content = text.read().split(" ")
content = [item.lower() for item in content]
content = ' '.join(content)
content = re.findall("[a-z]+", content)

for w in content:
    if not binS(0,len(words),w):
       f1.write(w)

print("--- %s seconds ---" % (time.time() - start_time))

I had this segment of code before which worked by printing to the terminal. (also how could I write 1 word per line in the write out file)

for w in content: if not binS(0,len(words),w): print(w)

Search time by printing to the terminal : 2000 seconds

Search time by writing to a file : 38 seconds

Upvotes: 1

Views: 1562

Answers (1)

Kos
Kos

Reputation: 72241

I can't see where you're closing the file after opening it. Writes to files are buffered, so that can be a reason.

A more proper way would be to use the with statement to close the file properly when you're done writing:

with open('writefile.txt', 'w+') as f1:
    for w in content:
        if not binS(0,len(words),w):
           f1.write(w)

In other news:

  • try using a set to store words, so that you do efficient lookups: if w not in words: ...
  • try rewriting the loop using f1.writelines and a generator expression

Upvotes: 2

Related Questions