Reputation: 7089
I'm using pandas.read_csv
to read a tab delimited file and am running into the error: Error tokenizing data. C error: Expected 364 fields in line 73058, saw 398
After much searching, it seems that the offending entry is: "– SO ,쳌 \\ ?Œ ø ,d -L ,ú ,‚ ZO
Removing the quotation mark seems to solve things. I've got a lot of large files with a lot of strange characters in them, so this will no doubt repeat itself. Do I need to remove single quotation marks ahead of time or is there some way around this?
Upvotes: 1
Views: 1726
Reputation: 375445
There is a quoting argument for read_csv
:
quoting : int or csv.QUOTE_* instance, default None
Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
Default (None) results in QUOTE_MINIMAL behavior.
These are described in the csv docs.
Try setting quoting=3
(i.e. QUOTE_NONE
).
Upvotes: 4