new recruit 21
new recruit 21

Reputation: 111

Python Text Document Translation Comparison

Fairly simple question. I'm trying to create a "translation comparison" program that reads and compares two documents and then returns every word that isn't in the other document. This is for a beginner class, so I'm trying to avoid using obscure internal methods, even if that means less efficient code. This is what I have so far...

def translation_comparison():
   import re
   file1 = open("Desktop/file1.txt","r")
   file2 = open("Desktop/file2.txt","r")
   text1 = file1.read()
   text2 = file2.read()
   text1 = re.findall(r'\w+',text1)
   text2 = re.findall(r'\w+',text2)
   for item in text2:
       if item not in text1:
           return item  

Upvotes: 0

Views: 215

Answers (2)

Navith
Navith

Reputation: 1069

Assuming you want word-by-word comparisons, like a b c against b a c would return both a and b, then b and a (as opposed to None as in your original code)

import string
import itertools

class FileExhausted(Exception): pass

def read_by_word(file):
    def read_word():
        while True:
            l = file.read(1)
            if l:
                if l in string.whitespace:
                    break
                yield l
            else:
                raise FileExhausted

    while True:
        this_word_gen = read_word()
        try:
            this_word = "".join(this_word_gen)
        except FileExhausted:
            break
        else:
            if this_word:
                yield this_word

def translation_comparison():
    with open("file1.txt") as file1, open("file2.txt") as file2:
        words1 = read_by_word(file1)
        words2 = read_by_word(file2)

        for (word1, word2) in itertools.zip_longest(words1, words2, fillvalue=None):
            if word1 != word2:
                yield (word1, word2)

Upvotes: 1

reticentroot
reticentroot

Reputation: 3682

You might try something like this

#######Test data
#file1.txt = this is a test
#file2.txt = this a test
#results#
#is

def translation_comparison():
    with open("file1.txt", 'r') as f1:
        f1 = f1.read().split()
    with open("file2.txt", 'r') as f2:
        f2 = f2.read().split()

    for word in f1:
        if word not in f2:
            print(word)


translation_comparison()

also it is good practice to use

with open("file1.txt", 'r') as f1:
        f1 =f1.read().split()

because when using with to open up files it will close the file when you're not using it. Python is pretty good at releasing and managing memory but it is always good habit to make sure you release it or call

file1.close()

when you are done.

Upvotes: 1

Related Questions