How to read file and calculate a specific value

Question

How can I find out how many keywords from a file are also in another file? I have a file containing a list of words, and I'm trying to figure out if those words are in another file.

I have a file (keywords.txt) containing key words, and I'm trying to find out if another file contains (tweets.txt) which contains sentences, contains any of the keywords

def main() :
   done = False
   while not done:
        try:
            keywords = input("Enter the filename titled keywords: ")
            with open(keywords, "r") as words:
                done = True
        except IOError:
            print("Error: file not found.")

total = 0
try:
    tweets = input("Enter the file Name titled tweets: ")
    with open(tweets, 'r') as tweets:
except IOError:
    print("Error: file not found.")

def sentiment_of_msg(msg_words_counter):
        summary = 0
        for line in tweets:
                if happy_dict in line:
                    summary += 10 * **The number of keywords in the sentence of the file**
                elif veryUnhappy_dict in line:
                    summary += 1 * quantity 
                elif neutral_dict in line:
                    summary += 5 * quantity
            return summary

themistoklik · Accepted Answer

I'm sensing that this is homework so the best I can do is give you an outline for the solution.

If you can afford to load files in memory:

Load keywords.txt, read its lines, split them into tokens and construct a set from them . Now you have a data structure capable of fast membership queries (ie you can ask if token in set and get an answer in constant time.
Load the tweets file as you did with keywords, and read its contents line by line (or however they are formatted). You might need to do some preprocessing (stripping whitespace, replacing unnecessary characters, delete invalid words, commas etc). For every line, split it so you get the words for each tweet and ask if any of the splitted words are in keywords set.

Pseudocode would look like this:

file=open(keywords)
keywords_set=set()
for token in file.readlines():
    for word in token.split():
        keywords_set.add(word)

file=open(tweets)
for token in file.readlines():
   preprocess(token) #function with your custom logic
   for item in token.split():
       if item in keywords:
           do_stuff() #function with your custom logic

If you want frequency of keywords, build a dictionary with {key:key_frequency}. Or check out Counter and think about how you could solve your problem with this.

If you cannot load the tweets file into memory consider a lazy solution for reading the big file using generators

How to read file and calculate a specific value

Answers (1)

Related Questions