Suraj_j
Suraj_j

Reputation: 184

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 8

I am trying to read a CSV using pd.read_csv, but I get an error:

UnicodeDecodeError Traceback (most recent call last) pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 8: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError
Traceback (most recent call last) ipython-input-84-c0272ccf19e6 in module

Sample of my data:

Time,Type,Instrument,Product,Qty.,Avg. price,Status
3/27/2019 13:46,BUY,MFSL,MIS,1600,115.25,COMPLETE
3/27/2019 13:46,BUY,MFSL,MIS,500,115.3,COMPLETE

i have already tried checking if there is an invalid character (using Notepad++). "Show all characters"

I couldn't find any difference when comparing this file with a similar one that could be loaded. Just Need help Troubleshooting the above, If someone can point me in the right direction.

Upvotes: 0

Views: 1920

Answers (1)

brad sanders
brad sanders

Reputation: 2539

The data you posted works fine for me, but it's several degrees removed from your source. Specifying an encoding when opening the file may fix the problem. You can do this a couple of ways: use the codecs package to open the file and let that decide the encoding, or specify the encoding in csv_read()

    import codecs

doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set

df = pandas.read_csv(doc, sep=',')

You also might want to sanitize your column names, as spaces and decimals can cause problems in referencing.

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '').str.replace('.', '')

Upvotes: 1

Related Questions