Reputation: 5
so in our assignment my professor would like us to read in a text file line by line, then word by word, then create a dictionary counting the frequency of each word appearing. Here's what I have for now:
wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
for line in f:
for word in line.split():
line = line.lower()
word = word.strip(string.punctuation + string.digits)
if word:
wordcount[word] = line.count(word)
return wordcount
What happens is that my dictionary tells me how many of each word appears in a particular line, leaving me with mostly 1s when some words show up in the entire text many times. How can I get my dictionary to count words from the entire text, not just a line?
Upvotes: 0
Views: 4067
Reputation: 571
In case you want to see another way to do this. It's not exactly line by line and word by word as you have requested, but you should be aware of the collections module which could be very useful sometimes.
from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
for line in f:
# Do all the cleaning you need here
c.update(line.lower().split())
# Get all the statistic you want, for example:
c.most_common(10)
Upvotes: 0
Reputation: 61
This is how I would do it:
import string
wordcount = {}
with open('test.txt', 'r') as f:
for line in f:
line = line.lower() #I suppose you want boy and Boy to be the same word
for word in line.split():
#what if your word has funky punctuations chars next to it?
word = word.translate(string.maketrans("",""), string.punctuation)
#if it's already in the d increase the number
try:
wordcount[word] += 1
#if it's not this is the first time we are adding it
except:
wordcount[word] = 1
print wordcount
Good luck!
Upvotes: 1
Reputation: 4056
The problem is in this line:
wordcount[word] = line.count(word)
Every time that line executes, whatever the value of wordcount[word]
was is getting replaced by line.count(word)
when you want it to be added. Try changing it to:
wordcount[word] = wordcount[word] + line.count(word)
Upvotes: 1
Reputation: 19729
The problem is you are resetting it every time, the fix is quite simple:
wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
for line in f:
for word in line.split():
line = line.lower()
word = word.strip(string.punctuation + string.digits)
if word:
if word in wordcount:
wordcount[word] += line.count(word)
else:
wordcount[word] = line.count(word)
return wordcount
Upvotes: 3