soulsister
soulsister

Reputation: 25

How can I fix "UnicodeDecodeError: 'utf-8' codec can't decode bytes..." in python?

I need to read specified rows and columns of csv file and write into txt file.But I got an unicode decode error.

import csv

with open('output.csv', 'r', encoding='utf-8') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print(your_list)

Upvotes: 1

Views: 6044

Answers (2)

Abhinav Sood
Abhinav Sood

Reputation: 789

The reason for this error is perhaps that your CSV file does not use UTF-8 encoding. Find out the original encoding used for your document.

First of all, try using the default encoding by leaving out the encoding parameter:

with open('output.csv', 'r') as f:
    ...

If that does not work, try alternative encoding schemes that are commonly used, for example:

with open('output.csv', 'r', encoding="ISO-8859-1") as f:
    ...

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148900

If you get a unicode decode error with this code, it is likely that the csv file is not utf-8 encoded... The correct fix is to find what is the correct encoding and use it.

If you only want quick and dirty workarounds, Python offers the errors=... option of open. From the documentation of open function in the standard library:

'strict' to raise a ValueError exception if there is an encoding error. The default value of None has the same effect.
'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.
'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.
'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.
'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;.
'backslashreplace' replaces malformed data by Python’s backslashed escape sequences.
'namereplace' (also only supported when writing) replaces unsupported characters with \N{...} escape sequences.

I often use errors='replace', when I only want to know that there were erroneous bytes or errors='backslashreplace' when I want to know what they were.

Upvotes: 0

Related Questions