Reputation: 3805
pandas refuses to read files that have too many commas (after the first line) :
Trying to read_csv the following :
col1,col2,col3
foo,1,2
bar,2,3
zob,0,3,4
Will give me an error
However, pandas accepts the following no matter the options I tried in read_csv :
col1,col2,col3
foo,1,2
bar,2,3
zob,0
And will just consider that the value in col3 for the last line is null
Is there any pandas way to raise an exception when this (too few fields in one row) happens ? (In my case, it means the source of the file is faulty and the file needs to be downloaded again).
It seems error_bad_lines only concern lines with too many commas.
I can count separately the number of commas for each line before using read_csv, but I'd like to know if an option within pandas exists because it seems more natural / to ease code readability.
Upvotes: 2
Views: 814
Reputation: 210872
UPDATE:
he file does not contain any NaN values
In [85]: pd.read_csv(fn)
Out[85]:
col1 col2 col3
0 foo 1 2.1
1 bar 2 3.1
2 zob 0 NaN
so you can raise an exception if the following condition is met:
In [86]: pd.read_csv(fn).isnull().any().any()
Out[86]: True
Old answer:
Possible solution:
consider the following input CSV file:
col1,col2,col3
foo,1,2.1
bar,2,3.1
zob,0
the following works:
In [50]: pd.read_csv(fn, dtype={'col3':'float'})
Out[50]:
col1 col2 col3
0 foo 1 2.1
1 bar 2 3.1
2 zob 0 NaN
but if we instruct Pandas not to treat empty string as NaN
's, then it'll throw an exception:
In [51]: pd.read_csv(fn, na_values=['NAN','NaN','#NA'], keep_default_na=False, dtype={'col3':'float'})
...
skipped
...
ValueError: could not convert string to float:
Upvotes: 1