Hanako Ohashi
Hanako Ohashi

Reputation: 1

Count frequency of word in text file in Python

I'm trying to figure out how to make a program that takes a file that the user chooses (by inputting the file name) and counts the frequency of each of the words the user inputs.

I have most of it, but when I type in multiple words for the program to find, only the first word displays the correct frequency and the rest display as "0 occurrences"

file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word) 

word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):

    frequency = 0
    for word in words:
        word = word.strip(",.")
        if search_word[word_number].lower() == word.lower():
            frequency += 1
    print ("   ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
    word_number = word_number + 1

Like an example output would be:

What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem

... analyzing ... hold on ...

Frequency of word usage within assignment_8.txt:
    wey                  / 96 occurrences
    rights               / 0 occurrences
    dem                  / 0 occurrences

What's wrong with my program? Please help :o

Upvotes: 0

Views: 4618

Answers (2)

Vijayendra Dwari
Vijayendra Dwari

Reputation: 11

There are many ways this can be done, below is a program to read a .txt file and create a dictionary with Wordlist and Word Frequency, we also split and identify sentences.

"""
Created on Fri Jun 11 17:06:52 2021

@author: Vijayendra Dwari
"""

sentences = []
wordlist = []

digits = "1,2,3,4,5,6,7,8,9,0"
punc = "!,@,$,%+,^,&,*,(),>,‚·<,},{,[],#,_ï,-,/,',’"
drop =    "a,is,are,when,then,an,the,we,us,upto,,them,their,from,for,in,of,at,to,out,in,and,into,any,but,also,too,that"
import os

FileName = input("Please enter the file name: ")
f = open('FileName',"r")
for line in f:    
line = " ".join(line.split())
line = "".join([c for c in line if c not in digits])   
line = "".join([c for c in line if c not in punc])
line = "".join(line.split('  '))

temp = line.split('.')
temp2 = line.split(' ')
sentences.append(temp)
wordlist.append(temp2)
word_dict = {'wordlist':'word_freq'}
wordcount=0
for i in range(0,len(sentences)):
    for word in wordlist[i]:
        if word not in drop:                        
            word_dict[word] = word_dict.get(word, 0) + 1
            wordcount += 1
        i=i+1
        word_freq = []    
for key, value in word_dict.items():
    word_freq.append((value, key))
   
f.close()
print(word_freq)
print(wordlist)
print(sentences)

enter image description here

Upvotes: 2

PM 2Ring
PM 2Ring

Reputation: 55469

You need to strip off the spaces from your search words.

However, your current algorithm is very inefficient because it has to rescan the entire text for every search word. Here's a more efficient way. Firstly, we clean up the search words and put them into a list. Then we build a dictionary from that list to store the counts of each of those words when we find them in the text file.

file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
    words = f.read().split()

search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)

print ('\n... analyzing ... hold on ...')
for word in words:
    word = word.rstrip(",.").lower()
    if word in search_counts:
        search_counts[word] += 1

print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
    print("   {:<20s} / {} occurrences".format(word, search_counts[word]))

Upvotes: 1

Related Questions