itzy
itzy

Reputation: 11765

Reading variable number of columns in pandas

I have a poorly formatted delimited file, in which the there are errors with the delimiter, so it sometimes appears that there are an inconsistent number of columns in different rows.

When I run

pd.read_csv('patentHeader.txt', sep="|", header=0)

the process dies with this error:

CParserError: Error tokenizing data. C error: Expected 9 fields in line 1034558, saw 15

Is there a way to have pandas skip these lines and continuing? Or put differently, is there some way to make read_csv be more flexible about how many columns it encounters?

Upvotes: 1

Views: 651

Answers (1)

Jianxun Li
Jianxun Li

Reputation: 24752

Try this.

pd.read_csv('patentHeader.txt', sep="|", header=0, error_bad_lines=False)

error_bad_lines: if False then any lines causing an error will be skipped bad lines, and it will be reported once the reading process is done.

Upvotes: 2

Related Questions