Reputation: 13
The following code is searching a text file by line to filter through bad lines and add the good lines to a new file. For some reason, the file is only returning lines with '-', and not responding to any of the other words.
Is there a problem with this code that might cause this to happen? Or is it more likely a problem with the text file?
import re
new=open('FilteredData.txt', 'w')
f=open('ClusteredData.txt', 'r')
line = f.readline()
while line:
reResult = re.search(r'-',line, re.I)
reResult1 = re.search(r'by', line, re.I)
reResult2=re.search(r'ft', line, re.I)
reResult3=re.search(r'feat', line, re.I)
reResult4=re.search(r'f\.', line, re.I)
if reResult or reResult1 or reResult2 or reResult3 or reResult4:
new.write(line)
line = f.readline()
Upvotes: 1
Views: 125
Reputation: 140
I experienced a similar problem before due to text encoding issues. The code looks fine to me, I have ran it on a text file without any non-ascii characters, with UTF-8 encoding, and it works. Is there any gibberish in your new text file? If there is, it is likely a problem with the text file itself. Try checking that your text is encoded with the right encoding.
Maybe try running the code on a small subset of the text file and see if it works.
Upvotes: 1