Emily
Emily

Reputation: 81

Count word frequency in a .txt file using only dictionary Python 3

I've been having trouble getting my program to output the number of occurrences a word appears from an imported .txt file. For my assignment, I can only use the dictionary function (no Counter), and must remove all punctuation and capitalization from a file. We are using Shakespeare's Hamlet from Project Gutenberg as an example (link). I've read other posts in hopes of remedying my situation, but to no avail. This answer by inspectorG4dget seems to illustrate my ideal program code, but when I run my program, a KeyError pops up for the chosen word. Here is my edited program (still receiving the error message with this code):

def word_dictionary(x):
    wordDict = {}
    filename = open(x, "r").read()
    filename = filename.lower()
    for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
        filename = filename.replace(ch, " ")
    for line in filename:
        for word in line.strip().split():
            if word not in wordDict:
                wordDict[word] = wordDict.get(word, 0) + 1
    return wordDict

Here is a desired sample session:

>>>import shakespeare
>>>words_with_counts = shakespeare.word_dictionary("/Users/username/Desktop/hamlet.txt")
>>>words_with_counts[’the’]
993
>>>words_with_counts[’laugh’]
6

This is what I get:

>>> import HOPE
>>> words_with_counts = HOPE.word_dictionary("hamlet.txt")
>>> words_with_counts["the"]
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    words_with_counts["the"]
KeyError: 'the'

Would anyone be able to detect what is wrong with my code?? Any help is much appreciated!

Upvotes: 2

Views: 2375

Answers (3)

Phylogenesis
Phylogenesis

Reputation: 7890

You are using the wrong keys for your dictionary. The loop should be as follows:

for word in filename.strip().split():
    if word not in wordDict:
        wordDict[word] = 0
    wordDict[word] += 1

Upvotes: 3

Anupam
Anupam

Reputation: 49

I think the error came up because of

for line in filename:

Here 'filename' is a string, not input of a file as

filename = open(x, "r").read()

was used. 'line' is pulling out every character, not line. Try replacing the code with below function

def word_dictionary(x):
    wordDict = {}
    filename = open(x,"r").read()
    filename = filename.lower()
    for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
        filename = filename.replace(ch," ")
    for word in filename.split():
        if word not in wordDict:
            wordDict[word] = 1
        else:
            wordDict[word] = wordDict[word] + 1
    return wordDict

Upvotes: 0

mdurant
mdurant

Reputation: 28684

if word not in wordDict

and

`wordDict[1]` -> `wordDict[word]`

(two occurances)

Why were you counting the length?

Upvotes: 1

Related Questions