Reputation: 11
I have a csv file, say test.csv and looks like this
Data 1 | Data 2 | Data 3 |
1 | 2 | 3 | | | 6
If I open up the csv in notepad++ or text, it shows up like this
Data 1, Data 2, Data 3
1,2,3,,,6
instead of
Data 1, Data 2, Data 3, ,,
1,2,3,,,6
I get an error using pandas.read_csv because my rows have more columns than declared in my header row. Are there ways to find the maximum number of columns of the entire CSV?
Upvotes: 0
Views: 2122
Reputation: 30981
My proposition is to:
So the code should be:
df = pd.read_csv(io.StringIO(txt), skiprows=1,
names=['Data 1', 'Data 2', 'Data 3', 'Data 4', 'Data 5', 'Data 6'])
A detail to consider is how should be represented empty cells.
The default behaviour of read_csv is to treat them as NaN, but the downside is then that such columns are coerced to float.
Another possibility is to pass na_filter=False
parameter and then:
Note also that the format of your title row is corrupted in one more detail. The source of problem are spaces after commas: They become the initial part of the next column name.
So even if you had just 3 columns of data, the column names would be:
['Data 1', ' Data 2', ' Data 3']
(note the leading spaces). So this is yet another reason to skip the title row.
Upvotes: 1