Reputation: 184
I am trying to read a CSV using pd.read_csv
, but I get an error:
UnicodeDecodeError Traceback (most recent call last) pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 8: invalid start byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError
Traceback (most recent call last) ipython-input-84-c0272ccf19e6 in module
Sample of my data:
Time,Type,Instrument,Product,Qty.,Avg. price,Status
3/27/2019 13:46,BUY,MFSL,MIS,1600,115.25,COMPLETE
3/27/2019 13:46,BUY,MFSL,MIS,500,115.3,COMPLETE
i have already tried checking if there is an invalid character (using Notepad++). "Show all characters"
I couldn't find any difference when comparing this file with a similar one that could be loaded. Just Need help Troubleshooting the above, If someone can point me in the right direction.
Upvotes: 0
Views: 1920
Reputation: 2539
The data you posted works fine for me, but it's several degrees removed from your source. Specifying an encoding when opening the file may fix the problem. You can do this a couple of ways: use the codecs package to open the file and let that decide the encoding, or specify the encoding in csv_read()
import codecs
doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set
df = pandas.read_csv(doc, sep=',')
You also might want to sanitize your column names, as spaces and decimals can cause problems in referencing.
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '').str.replace('.', '')
Upvotes: 1