Reading variable number of columns in pandas

Question

I have a poorly formatted delimited file, in which the there are errors with the delimiter, so it sometimes appears that there are an inconsistent number of columns in different rows.

When I run

pd.read_csv('patentHeader.txt', sep="|", header=0)

the process dies with this error:

CParserError: Error tokenizing data. C error: Expected 9 fields in line 1034558, saw 15

Is there a way to have pandas skip these lines and continuing? Or put differently, is there some way to make read_csv be more flexible about how many columns it encounters?

Jianxun Li · Accepted Answer

Try this.

pd.read_csv('patentHeader.txt', sep="|", header=0, error_bad_lines=False)

error_bad_lines: if False then any lines causing an error will be skipped bad lines, and it will be reported once the reading process is done.

Reading variable number of columns in pandas

Answers (1)

Related Questions