Reputation: 81
I've been having trouble getting my program to output the number of occurrences a word appears from an imported .txt file. For my assignment, I can only use the dictionary function (no Counter), and must remove all punctuation and capitalization from a file. We are using Shakespeare's Hamlet from Project Gutenberg as an example (link). I've read other posts in hopes of remedying my situation, but to no avail. This answer by inspectorG4dget seems to illustrate my ideal program code, but when I run my program, a KeyError pops up for the chosen word. Here is my edited program (still receiving the error message with this code):
def word_dictionary(x):
wordDict = {}
filename = open(x, "r").read()
filename = filename.lower()
for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
filename = filename.replace(ch, " ")
for line in filename:
for word in line.strip().split():
if word not in wordDict:
wordDict[word] = wordDict.get(word, 0) + 1
return wordDict
Here is a desired sample session:
>>>import shakespeare
>>>words_with_counts = shakespeare.word_dictionary("/Users/username/Desktop/hamlet.txt")
>>>words_with_counts[’the’]
993
>>>words_with_counts[’laugh’]
6
This is what I get:
>>> import HOPE
>>> words_with_counts = HOPE.word_dictionary("hamlet.txt")
>>> words_with_counts["the"]
Traceback (most recent call last):
File "<pyshell#16>", line 1, in <module>
words_with_counts["the"]
KeyError: 'the'
Would anyone be able to detect what is wrong with my code?? Any help is much appreciated!
Upvotes: 2
Views: 2375
Reputation: 7890
You are using the wrong keys for your dictionary. The loop should be as follows:
for word in filename.strip().split():
if word not in wordDict:
wordDict[word] = 0
wordDict[word] += 1
Upvotes: 3
Reputation: 49
I think the error came up because of
for line in filename:
Here 'filename' is a string, not input of a file as
filename = open(x, "r").read()
was used. 'line' is pulling out every character, not line. Try replacing the code with below function
def word_dictionary(x):
wordDict = {}
filename = open(x,"r").read()
filename = filename.lower()
for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
filename = filename.replace(ch," ")
for word in filename.split():
if word not in wordDict:
wordDict[word] = 1
else:
wordDict[word] = wordDict[word] + 1
return wordDict
Upvotes: 0
Reputation: 28684
if word not in wordDict
and
`wordDict[1]` -> `wordDict[word]`
(two occurances)
Why were you counting the length?
Upvotes: 1