B.J96
B.J96

Reputation: 11

Comparing a text file to another text file in python?

I need the ability to compare two text files. File 1 being a chat log and file 2 being a wordlist with key words in. I am struggling to get the output I desire which is ideally showing every time one of the key words in File 2 appears in the chat log which is file 1. Any ideas on how I could achieve this output?

edit*

this is the code I'm currently trying to use, however the output i get is that it prints both files to the text box within the gui. The output need is to show what lines the words from file 2 occur within the file 1. Some of the code is taken from a keyword search feature I already have working.

def wordlistsearch():

filename = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,)) //file1
mtxt = filename.readline()
i =0
filename2 = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,)) //file2

while i<10000:
    keystring = filename2.readline()
    print keystring
    participant = mtxt.split("(")[0]
    temppart2 = mtxt.split("(")[-1]
    keyword = temppart2.split(")")[0]
    if mtxt.find(str(keystring)) != -1:
        print i, ": ", mtxt
    i=i+1
    mtxt = filename.readline()

Upvotes: 1

Views: 101

Answers (2)

Felippe Raposo
Felippe Raposo

Reputation: 431

That's a very good question. Personally I think you can do this:

# I suppose the keywords has non repeated words separated by a space 
keywords_file = open('path_to_file_keywords')
keywords_dict = {word: 0 for word in keywords_file.readlines().strip().split(' ')} # Iterate through all the words removing '\n'characters and generate a dict

# Then read the chat log
chat_log_file = open('path_to_file_chat_log')
chat_log_words_generator = (word for word in chat_log_file.readlines().strip().split(' ')) # Create a generator with the words from the chat log


for word in chat_log_words_generator:
    try:
        word_count = keywords_dict[word]
    except KeyError:
        continue # The word is not a keyword
    word_count += 1 # increment the total
    keywords_dict[word] = word_count # override the value of the count in the dict

In the end the keywords_dict should have the count of the occurrences of all the keywords.

Upvotes: 0

Edd
Edd

Reputation: 1370

If you want to find all the words in File 1 that are also in File2, you can use:

keywords = set([word for line in open("keyword_file","r") for word in line.split()])

words = set([word for line in open("log_file","r") for word in line.split()])

common = words.intersection(keywords)

To find the occurrence of a match while reading File 1 instead:

keywords = set([word for line in open("keyword_file","r") for word in line.split()])

for line in open("log_file","r"):
    for word in line:
        if word in keywords:
            print "found {0} in line {1}".format(word, line)

Upvotes: 1

Related Questions