Reputation: 53
I have two text files. From textfile1, I selected 50 most common words. Now I want to search these 50 most common words.
readFile = open('textfile1.text', 'r')
sepFile = readFile.read()
words = re.findall('\w+', sepFile)
for word in [words]:
word_long = [w for w in word if len(w) > 3]
word_count = Counter(word_long).most_common(50)
count = word_count
list1=count
readFile1 = open('textfile2.txt', 'r')
sepFile1 = readFile1.read()
word2 = re.findall('\w+', sepFile1)
for word in [word2]:
word_long1 = [w for w in word if len(w) > 3]
word_count1 = Counter(word_long1).most_common(50)
count2 = word_count1
list1=count2
a=words1
c=Counter(a)
for w in words:
print w, c.get(w,0)
Upvotes: 0
Views: 305
Reputation: 163
It would probably be helpful to use dictionaries. Counter.most_common()
returns a list of tuples, which you can convert into a dict
:
file1_common_words = dict(Counter(all_words_in_file1).most_common(50))
file2_common_words = dict(Counter(all_words_in_file2).most_common(50))
Then, for each word in file1_common_words
, you can look up that word in file2_common_words
to get its count in file 2:
for (word, count) in file1_common_words.items():
try:
count_in_file2 = file2_common_words[word]
except KeyError:
# if the word is not present file2_common_words,
# then its count is 0.
count_in_file2 = 0
print("{0}\t{1}\t{2}".format(word, count, count_in_file2))
This will output lines of the format:
<most_common_word_1> <count_in_file1> <count_in_file2>
<most_common_word_2> <count_in_file1> <count_in_file2>
...
Upvotes: 1