Reputation: 11765
I have a poorly formatted delimited file, in which the there are errors with the delimiter, so it sometimes appears that there are an inconsistent number of columns in different rows.
When I run
pd.read_csv('patentHeader.txt', sep="|", header=0)
the process dies with this error:
CParserError: Error tokenizing data. C error: Expected 9 fields in line 1034558, saw 15
Is there a way to have pandas skip these lines and continuing? Or put differently, is there some way to make read_csv
be more flexible about how many columns it encounters?
Upvotes: 1
Views: 651
Reputation: 24752
Try this.
pd.read_csv('patentHeader.txt', sep="|", header=0, error_bad_lines=False)
error_bad_lines
: if False then any lines causing an error will be skipped bad lines, and it will be reported once the reading process is done.
Upvotes: 2