Rice
Rice

Reputation: 1

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 31: invalid start byte

I'm encountering this error only on file in a list of seemingly identical files. My code is as follows:

data_dir = 'C:/Users\ebook\Downloads\Batch One Set\Sample Output'

for filepath in (os.listdir(data_dir)):
    splitstr = filepath.split('.')
    title = splitstr[0]
    metadata = pandas.read_csv(data_dir + '/' + filepath, nrows = 60)

The error occurs in the pandas.read_csv funtion.

Everything is fine and dandy for the previous files, such as "Patient 3-1.csv" "Patient 34-1.csv" etc. but on "Patient 35-1.csv" this error flips up. Any ideas why?

UPDATE: file contains strange characters, especially the degree one "TOS - R Digit 3 (^13014669E 90° ^13014670E)"

any advice on encodings that might resolve this?

Setting encoding='unicode_escape', encoding='latin1' and changing engine='python' does not fix the issue.

Update: Upgrading Pandas to latest version (2.0.3) and encoding = 'latin1' solved the issue.

Upvotes: 0

Views: 76

Answers (0)

Related Questions