user9174392
user9174392

Reputation:

Python Word occurrence

I'm trying to open and read a text file and count the number of types a word occurs for example if the word better is in the text it would have a frequency of 8. I have attached the code below. I got the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 861: invalid start byte

file=open('IntroductoryCS.txt')

wordcount={}

for word in file.read().split():
        if word not in wordcount:
           wordcount[word] = 1
        else:
           wordcount[word] += 1

for k,v in wordcount.items():
      print k, v

I am using IDLE 3.5.1

Upvotes: 0

Views: 79

Answers (2)

Joao
Joao

Reputation: 37

Your code is working fine.

Try to save the txt file as UTF-8. Open the file on notepad, then save as, and choose encoding UTF-8.

Upvotes: 1

kolurbo
kolurbo

Reputation: 538

It seems that you IntroductoryCS.txt is not in UTF-8.

You should change encoding in open() function.

Something like this:

file=open('IntroductoryCS.txt', encoding='<your_encoding_here>')

See documentation here.

I don't know what encoding is your file but try this:

file=open('IntroductoryCS.txt', encoding='latin-1')

Here are avalaible encodings.

Upvotes: 1

Related Questions