Saqib Alam
Saqib Alam

Reputation: 53

How to compare most common words in on text file to other text file

I have two text files. From textfile1, I selected 50 most common words. Now I want to search these 50 most common words.

readFile = open('textfile1.text', 'r')
sepFile = readFile.read()
words = re.findall('\w+', sepFile)
for word in [words]:
word_long = [w for w in word if len(w) > 3]
word_count = Counter(word_long).most_common(50)
count = word_count
list1=count

readFile1 = open('textfile2.txt', 'r')
sepFile1 = readFile1.read()
word2 = re.findall('\w+', sepFile1)
for word in [word2]:
word_long1 = [w for w in word if len(w) > 3]
word_count1 = Counter(word_long1).most_common(50)
count2 = word_count1
list1=count2
a=words1
c=Counter(a)
for w in words:
print w, c.get(w,0)  

Upvotes: 0

Views: 305

Answers (1)

Samira N
Samira N

Reputation: 163

It would probably be helpful to use dictionaries. Counter.most_common() returns a list of tuples, which you can convert into a dict:

file1_common_words = dict(Counter(all_words_in_file1).most_common(50))
file2_common_words = dict(Counter(all_words_in_file2).most_common(50))

Then, for each word in file1_common_words, you can look up that word in file2_common_words to get its count in file 2:

for (word, count) in file1_common_words.items():
    try: 
        count_in_file2 = file2_common_words[word]
    except KeyError: 
        # if the word is not present file2_common_words,
        # then its count is 0.
        count_in_file2 = 0 
    print("{0}\t{1}\t{2}".format(word, count, count_in_file2))

This will output lines of the format:

<most_common_word_1>    <count_in_file1>    <count_in_file2>
<most_common_word_2>    <count_in_file1>    <count_in_file2>
...

Upvotes: 1

Related Questions