WNG
WNG

Reputation: 3805

Pandas : Raise error when a line is incomplete

pandas refuses to read files that have too many commas (after the first line) :

Trying to read_csv the following :

col1,col2,col3
foo,1,2
bar,2,3
zob,0,3,4

Will give me an error

However, pandas accepts the following no matter the options I tried in read_csv :

col1,col2,col3
foo,1,2
bar,2,3
zob,0

And will just consider that the value in col3 for the last line is null

Is there any pandas way to raise an exception when this (too few fields in one row) happens ? (In my case, it means the source of the file is faulty and the file needs to be downloaded again).

It seems error_bad_lines only concern lines with too many commas.

I can count separately the number of commas for each line before using read_csv, but I'd like to know if an option within pandas exists because it seems more natural / to ease code readability.

Upvotes: 2

Views: 814

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210872

UPDATE:

he file does not contain any NaN values

In [85]: pd.read_csv(fn)
Out[85]:
  col1  col2  col3
0  foo     1   2.1
1  bar     2   3.1
2  zob     0   NaN

so you can raise an exception if the following condition is met:

In [86]: pd.read_csv(fn).isnull().any().any()
Out[86]: True

Old answer:

Possible solution:

consider the following input CSV file:

col1,col2,col3
foo,1,2.1
bar,2,3.1
zob,0

the following works:

In [50]: pd.read_csv(fn, dtype={'col3':'float'})
Out[50]:
  col1  col2  col3
0  foo     1   2.1
1  bar     2   3.1
2  zob     0   NaN

but if we instruct Pandas not to treat empty string as NaN's, then it'll throw an exception:

In [51]: pd.read_csv(fn, na_values=['NAN','NaN','#NA'], keep_default_na=False, dtype={'col3':'float'})
...
skipped
...
ValueError: could not convert string to float:

Upvotes: 1

Related Questions