Reputation:
I'm trying to open and read a text file and count the number of types a word occurs for example if the word better is in the text it would have a frequency of 8. I have attached the code below. I got the following error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 861: invalid start byte
file=open('IntroductoryCS.txt')
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k, v
I am using IDLE 3.5.1
Upvotes: 0
Views: 79
Reputation: 37
Your code is working fine.
Try to save the txt file as UTF-8
. Open the file on notepad, then save as, and choose encoding UTF-8
.
Upvotes: 1
Reputation: 538
It seems that you IntroductoryCS.txt is not in UTF-8.
You should change encoding in open() function.
Something like this:
file=open('IntroductoryCS.txt', encoding='<your_encoding_here>')
See documentation here.
I don't know what encoding is your file but try this:
file=open('IntroductoryCS.txt', encoding='latin-1')
Here are avalaible encodings.
Upvotes: 1