Reputation: 11
I need the ability to compare two text files. File 1 being a chat log and file 2 being a wordlist with key words in. I am struggling to get the output I desire which is ideally showing every time one of the key words in File 2 appears in the chat log which is file 1. Any ideas on how I could achieve this output?
edit*
this is the code I'm currently trying to use, however the output i get is that it prints both files to the text box within the gui. The output need is to show what lines the words from file 2 occur within the file 1. Some of the code is taken from a keyword search feature I already have working.
def wordlistsearch():
filename = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,)) //file1
mtxt = filename.readline()
i =0
filename2 = tkFileDialog.askopenfile(filetypes=(("Text files", "*.txt") ,)) //file2
while i<10000:
keystring = filename2.readline()
print keystring
participant = mtxt.split("(")[0]
temppart2 = mtxt.split("(")[-1]
keyword = temppart2.split(")")[0]
if mtxt.find(str(keystring)) != -1:
print i, ": ", mtxt
i=i+1
mtxt = filename.readline()
Upvotes: 1
Views: 101
Reputation: 431
That's a very good question. Personally I think you can do this:
# I suppose the keywords has non repeated words separated by a space
keywords_file = open('path_to_file_keywords')
keywords_dict = {word: 0 for word in keywords_file.readlines().strip().split(' ')} # Iterate through all the words removing '\n'characters and generate a dict
# Then read the chat log
chat_log_file = open('path_to_file_chat_log')
chat_log_words_generator = (word for word in chat_log_file.readlines().strip().split(' ')) # Create a generator with the words from the chat log
for word in chat_log_words_generator:
try:
word_count = keywords_dict[word]
except KeyError:
continue # The word is not a keyword
word_count += 1 # increment the total
keywords_dict[word] = word_count # override the value of the count in the dict
In the end the keywords_dict
should have the count of the occurrences of all the keywords.
Upvotes: 0
Reputation: 1370
If you want to find all the words in File 1 that are also in File2, you can use:
keywords = set([word for line in open("keyword_file","r") for word in line.split()])
words = set([word for line in open("log_file","r") for word in line.split()])
common = words.intersection(keywords)
To find the occurrence of a match while reading File 1 instead:
keywords = set([word for line in open("keyword_file","r") for word in line.split()])
for line in open("log_file","r"):
for word in line:
if word in keywords:
print "found {0} in line {1}".format(word, line)
Upvotes: 1