user
user

Reputation: 5

Read words from file into dictionary

so in our assignment my professor would like us to read in a text file line by line, then word by word, then create a dictionary counting the frequency of each word appearing. Here's what I have for now:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                wordcount[word] = line.count(word)
    return wordcount

What happens is that my dictionary tells me how many of each word appears in a particular line, leaving me with mostly 1s when some words show up in the entire text many times. How can I get my dictionary to count words from the entire text, not just a line?

Upvotes: 0

Views: 4067

Answers (4)

gl051
gl051

Reputation: 571

In case you want to see another way to do this. It's not exactly line by line and word by word as you have requested, but you should be aware of the collections module which could be very useful sometimes.

from collections import Counter
# instantiate a counter element
c = Counter()
with open('myfile.txt', 'r') as f:
     for line in f:
         # Do all the cleaning you need here 
         c.update(line.lower().split())

# Get all the statistic you want, for example:
c.most_common(10)

Upvotes: 0

Sunworshipper
Sunworshipper

Reputation: 61

This is how I would do it:

import string

wordcount = {}
with open('test.txt', 'r') as f:
    for line in f:
        line = line.lower() #I suppose you want boy and Boy to be the same word
        for word in line.split():
            #what if your word has funky punctuations chars next to it?
            word = word.translate(string.maketrans("",""), string.punctuation)
            #if it's already in the d increase the number
            try:
                wordcount[word] += 1
            #if it's not this is the first time we are adding it
            except:
                wordcount[word] = 1

print wordcount

Good luck!

Upvotes: 1

khagler
khagler

Reputation: 4056

The problem is in this line:

wordcount[word] = line.count(word)

Every time that line executes, whatever the value of wordcount[word] was is getting replaced by line.count(word) when you want it to be added. Try changing it to:

wordcount[word] = wordcount[word] + line.count(word)

Upvotes: 1

JCOC611
JCOC611

Reputation: 19729

The problem is you are resetting it every time, the fix is quite simple:

wordcount = {}
with open('/Users/user/Desktop/Text.txt', 'r', encoding='utf-8') as f:
    for line in f:
        for word in line.split():
            line = line.lower()
            word = word.strip(string.punctuation + string.digits)
            if word:
                if word in wordcount:
                    wordcount[word] += line.count(word)
                else:
                    wordcount[word] = line.count(word)
    return wordcount

Upvotes: 3

Related Questions