edwardw
edwardw

Reputation: 13952

Trailing delimiter confuses pandas read_csv

A csv (comma delimited) file, where lines have an extra trailing delimiter, seems to confuse pandas.read_csv. (The data file is [1])

It treats the extra delimiter as if there's an extra column. So there's one more column than what headers require. Then pandas.read_csv takes the first column as row labels. The overall effect is that columns and headers are not aligned any more - the first column becomes row labels, the second column is named by first header, etc.

It is quite annoying. Any idea how to tell pandas.read_csv do the right thing? I couldn't find one.

Great book, BTW.


[1]: 2012 FEC Election Database from chapter 9 of the book Python for Data Analysis

Upvotes: 20

Views: 9823

Answers (3)

k-nut
k-nut

Reputation: 3575

For everyone who is still finding this. Wes wrote a blogpost about this. The problem if there is one value too many in the row it is treated as the rows name.

This behaviour can be changed by setting index_col=False as an option to read_csv.

Upvotes: 19

Wes McKinney
Wes McKinney

Reputation: 105531

I created a GitHub issue to have a look at handling this issue automatically:

https://github.com/pydata/pandas/issues/2442

I think the FEC file format changed slightly causing this annoying issue-- if you use the one posted here http://github.com/pydata/pydata-book you hopefully won't have that problem.

Upvotes: 6

edwardw
edwardw

Reputation: 13952

Well, there's a very simple workaround. Add a dummy column to the header when reading csv file in:

cols = ...
cols.append('')
records = pandas.read_csv('filename.txt', skiprows=1, names=cols)

Then columns and header get aligned again.

Upvotes: 5

Related Questions