Reputation: 3937
I am using the code below to read a csv file into a dataframe. However, I get the error pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
and hence I changed pd.read_csv('D:/TRYOUT.csv')
to pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False)
as suggested here. However, I now get the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 1: invalid continuation byte
in the same line.
def ExcelFileReader():
mergedf = pd.read_csv('D:/TRYOUT.csv', error_bad_lines=False)
return mergedf
Upvotes: 0
Views: 2168
Reputation: 21
If you would like to exclude the rows providing error and ignore the malformed data then you need to use:
pd.read_csv(file_path, encoding="utf8", error_bad_lines=False, encoding_errors="ignore")
Upvotes: 0
Reputation: 77
I had a similar problem and had to use
utf-8-sig
as the encoding,
The reason i used utf-8-sig is because if you do ever get non-Latin characters it wont be able to deal with it correctly. There are a few ways of getting around the problem, but i guess you can just choose the best that suits your needs.
Hope that helps.
Upvotes: 0
Reputation: 36555
If you're on Windows, you probably need to use pd.read_csv(filename, encoding='latin-1')
Upvotes: 1