Reputation: 111
Fairly simple question. I'm trying to create a "translation comparison" program that reads and compares two documents and then returns every word that isn't in the other document. This is for a beginner class, so I'm trying to avoid using obscure internal methods, even if that means less efficient code. This is what I have so far...
def translation_comparison():
import re
file1 = open("Desktop/file1.txt","r")
file2 = open("Desktop/file2.txt","r")
text1 = file1.read()
text2 = file2.read()
text1 = re.findall(r'\w+',text1)
text2 = re.findall(r'\w+',text2)
for item in text2:
if item not in text1:
return item
Upvotes: 0
Views: 215
Reputation: 1069
Assuming you want word-by-word comparisons, like a b c
against b a c
would return both a
and b
, then b
and a
(as opposed to None
as in your original code)
import string
import itertools
class FileExhausted(Exception): pass
def read_by_word(file):
def read_word():
while True:
l = file.read(1)
if l:
if l in string.whitespace:
break
yield l
else:
raise FileExhausted
while True:
this_word_gen = read_word()
try:
this_word = "".join(this_word_gen)
except FileExhausted:
break
else:
if this_word:
yield this_word
def translation_comparison():
with open("file1.txt") as file1, open("file2.txt") as file2:
words1 = read_by_word(file1)
words2 = read_by_word(file2)
for (word1, word2) in itertools.zip_longest(words1, words2, fillvalue=None):
if word1 != word2:
yield (word1, word2)
Upvotes: 1
Reputation: 3682
You might try something like this
#######Test data
#file1.txt = this is a test
#file2.txt = this a test
#results#
#is
def translation_comparison():
with open("file1.txt", 'r') as f1:
f1 = f1.read().split()
with open("file2.txt", 'r') as f2:
f2 = f2.read().split()
for word in f1:
if word not in f2:
print(word)
translation_comparison()
also it is good practice to use
with open("file1.txt", 'r') as f1:
f1 =f1.read().split()
because when using with to open up files it will close the file when you're not using it. Python is pretty good at releasing and managing memory but it is always good habit to make sure you release it or call
file1.close()
when you are done.
Upvotes: 1