Neil Walker
Neil Walker

Reputation: 6848

codec can't decode byte 0x81

I have this simple bit of code:

file = open(filename, "r", encoding="utf-8")
num_lines = sum(1 for line in open(filename))

I simply want to get the number of lines in the file. However I keep getting this error. I'm thinking of just skipping Python and doing it in C# ;-)

Can anyone help? I added 'utf-8' after searching for the error and read it should fix it. The file is just a simple text file, not an image. Albeit a large file. It's actually a CSV string, but I just want to get an idea of the number of lines before I start processing it.

Many thanks.

in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4344: 
character maps to <undefined>

Upvotes: 1

Views: 19334

Answers (1)

cocool97
cocool97

Reputation: 1251

It seems to be an encoding problem.
In your example code, you are opening the file twice, and the second doesn't include the encoding.
Try the following code:

file = open(filename, "r", encoding="utf-8")
num_lines = sum(1 for line in file)

Or (more recent) :

with open(filename, "r", encoding="utf-8") as file:
    num_lines = sum(1 for line in file)

Upvotes: 6

Related Questions