Reputation: 21
really would need some help to solve this or if someone could point me in the right way, thanks!
View the 7 most common words found in the text, but sorting out the words that are common words. You can find a list of common words in common-words.txt.
common-words.txt = lots of different words.
first i have found the 7 most common words in the text, this is how my code looks like.
print("The 7 most frequently used words is:")
print("\n")
import re
from collections import Counter
with open("alice-ch1.txt") as f:
passage = f.read()
words = re.findall(r'\w+', passage)
cap_words = [word.upper() for word in words]
word_counts = Counter(cap_words).most_common(7)
print(word_counts)
this works and i get the output:
[('THE', 93), ('SHE', 80), ('TO', 75), ('IT', 67), ('AND', 65), ('WAS', 53), ('A', 52)]
now i want to compare theese two text files, if any of the word in my TEXTFILE.TXT is in COMMON-WORDS.TXT i want it removed from the answer.
i have tried to run it with this code:
dic_no_cw = dict(word_counts)
with open("common-words.txt", 'r') as cw:
commonwords = list(cw.read().split())
for key, value in list(dic_no_cw.items()):
for line in commonwords:
if key == line:
del dic_no_cw[key]
dict_copy = dict(dic_no_cw)
dic_no_cw7 = Counter(dic_no_cw).most_common(7)
sorted(dic_no_cw7)
print(dic_no_cw7)
and i get the same output:
[('THE', 93), ('SHE', 80), ('TO', 75), ('IT', 67), ('AND', 65), ('WAS', 53), ('A', 52)]
could really use som help to solve this or some help so i maybe can figure it out by myself.
thanks,
Upvotes: 0
Views: 109
Reputation: 308
I haven't checked it, but I think it may be that you're simply checking the value in the dict (which represents the number of times the word appears) instead of checking the key (the actual word itself) when comparing against the words in the commonwords list:
I believe if value == line:
should read if key == line:
.
Upvotes: 0
Reputation: 1748
Can you try with replacing these lines of your code:
for line in commonwords:
if key == line:
del dic_no_cw[key]
with
for line in commonwords:
if key.strip() == line.upper().strip():
del dic_no_cw[key]
break
Upvotes: 1