Reputation: 1
I'm trying to figure out how to make a program that takes a file that the user chooses (by inputting the file name) and counts the frequency of each of the words the user inputs.
I have most of it, but when I type in multiple words for the program to find, only the first word displays the correct frequency and the rest display as "0 occurrences"
file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word)
word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):
frequency = 0
for word in words:
word = word.strip(",.")
if search_word[word_number].lower() == word.lower():
frequency += 1
print (" ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
word_number = word_number + 1
Like an example output would be:
What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem
... analyzing ... hold on ...
Frequency of word usage within assignment_8.txt:
wey / 96 occurrences
rights / 0 occurrences
dem / 0 occurrences
What's wrong with my program? Please help :o
Upvotes: 0
Views: 4618
Reputation: 11
There are many ways this can be done, below is a program to read a .txt file and create a dictionary with Wordlist and Word Frequency, we also split and identify sentences.
"""
Created on Fri Jun 11 17:06:52 2021
@author: Vijayendra Dwari
"""
sentences = []
wordlist = []
digits = "1,2,3,4,5,6,7,8,9,0"
punc = "!,@,$,%+,^,&,*,(),>,‚·<,},{,[],#,_ï,-,/,',’"
drop = "a,is,are,when,then,an,the,we,us,upto,,them,their,from,for,in,of,at,to,out,in,and,into,any,but,also,too,that"
import os
FileName = input("Please enter the file name: ")
f = open('FileName',"r")
for line in f:
line = " ".join(line.split())
line = "".join([c for c in line if c not in digits])
line = "".join([c for c in line if c not in punc])
line = "".join(line.split(' '))
temp = line.split('.')
temp2 = line.split(' ')
sentences.append(temp)
wordlist.append(temp2)
word_dict = {'wordlist':'word_freq'}
wordcount=0
for i in range(0,len(sentences)):
for word in wordlist[i]:
if word not in drop:
word_dict[word] = word_dict.get(word, 0) + 1
wordcount += 1
i=i+1
word_freq = []
for key, value in word_dict.items():
word_freq.append((value, key))
f.close()
print(word_freq)
print(wordlist)
print(sentences)
Upvotes: 2
Reputation: 55469
You need to strip off the spaces from your search words.
However, your current algorithm is very inefficient because it has to rescan the entire text for every search word. Here's a more efficient way. Firstly, we clean up the search words and put them into a list. Then we build a dictionary from that list to store the counts of each of those words when we find them in the text file.
file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
words = f.read().split()
search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)
print ('\n... analyzing ... hold on ...')
for word in words:
word = word.rstrip(",.").lower()
if word in search_counts:
search_counts[word] += 1
print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
print(" {:<20s} / {} occurrences".format(word, search_counts[word]))
Upvotes: 1