Reputation: 21253
I am reading in a file with:
pd.read_csv("file.csv", dtype={'ID_1':float})
The file looks like:
ID_0, ID_1,ID_2
a,002,c
b,004,d
c, ,e
n,003,g
Unfortunately read_csv
fails complaining it can't convert ' ' to a float.
What is the right way to read in a csv and convert anything that can't be converted to a float into NaN?
Upvotes: 2
Views: 868
Reputation: 15204
This is my understanding of reading the documentation:
def my_func(x):
try:
converted_value = float(x)
except ValueError:
converted_value = 'NaN'
return converted_value
pd.read_csv("file.csv", dtype={'ID_1':float}, converters={'ID_1':my_func})
(As I am at work now and don't have access to pandas
I cannot tell you if it works but it looks as it should (said every programmer ever..))
See also these relevant SO questions:
Inconsistent pandas read_csv dtype inference on mostly-integer string column in huge TSV file
...and the pandas.read_csv
documentation
Upvotes: 4
Reputation: 394051
If you don't specify the dtype
param and pass skipinitialspace=True
then it will just work:
In [4]:
t="""ID_0,ID_1,ID_2
a,002,c
b,004,d
c, ,e
n,003,g"""
pd.read_csv(io.StringIO(t), skipinitialspace=True)
Out[4]:
ID_0 ID_1 ID_2
0 a 2.0 c
1 b 4.0 d
2 c NaN e
3 n 3.0 g
So in your case:
pd.read_csv("file.csv", skipinitialspace=True)
will just work
You can see that the dtypes
are as expected:
In [5]:
pd.read_csv(io.StringIO(t), skipinitialspace=True).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
ID_0 4 non-null object
ID_1 3 non-null float64
ID_2 4 non-null object
dtypes: float64(1), object(2)
memory usage: 176.0+ bytes
Upvotes: 5