Reputation: 49
How can I find out how many keywords from a file are also in another file? I have a file containing a list of words, and I'm trying to figure out if those words are in another file.
I have a file (keywords.txt) containing key words, and I'm trying to find out if another file contains (tweets.txt) which contains sentences, contains any of the keywords
def main() :
done = False
while not done:
try:
keywords = input("Enter the filename titled keywords: ")
with open(keywords, "r") as words:
done = True
except IOError:
print("Error: file not found.")
total = 0
try:
tweets = input("Enter the file Name titled tweets: ")
with open(tweets, 'r') as tweets:
except IOError:
print("Error: file not found.")
def sentiment_of_msg(msg_words_counter):
summary = 0
for line in tweets:
if happy_dict in line:
summary += 10 * **The number of keywords in the sentence of the file**
elif veryUnhappy_dict in line:
summary += 1 * quantity
elif neutral_dict in line:
summary += 5 * quantity
return summary
Upvotes: 0
Views: 286
Reputation: 880
I'm sensing that this is homework so the best I can do is give you an outline for the solution.
If you can afford to load files in memory:
if token in set
and get an answer in constant time.Pseudocode would look like this:
file=open(keywords)
keywords_set=set()
for token in file.readlines():
for word in token.split():
keywords_set.add(word)
file=open(tweets)
for token in file.readlines():
preprocess(token) #function with your custom logic
for item in token.split():
if item in keywords:
do_stuff() #function with your custom logic
If you want frequency of keywords, build a dictionary with {key:key_frequency}. Or check out Counter and think about how you could solve your problem with this.
If you cannot load the tweets file into memory consider a lazy solution for reading the big file using generators
Upvotes: 1