user1496984
user1496984

Reputation: 11575

Use NaN for values that can't be cast using astype

I have a very large Pandas DataFrame that looks like this:

>>> d = pd.DataFrame({"a": ["1", "U", "3.4"]})
>>> d
     a
0    1
1    U
2  3.4

Currently the column is set as an object:

>>> d.dtypes
a    object
dtype: object

I'd like to convert this column to float so that I can use groupby() and compute the mean. When I try it using astype I correctly get an error because of the string that can't be cast to float:

>>> d.a.astype(float)
ValueError: could not convert string to float: 'U'

What I'd like to do is to cast all the elements to float, and then replace the ones that can't be cast by NaNs.

How can I do this?

I tried setting raise_on_error, but it doesn't work, the dtype is still object.

>>> d.a.astype(float, raise_on_error=False)
0      1
1      U
2    3.4
Name: a, dtype: object

Upvotes: 4

Views: 5351

Answers (1)

Alex Riley
Alex Riley

Reputation: 176810

Use to_numeric and specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN:

>>> pd.to_numeric(d['a'], errors='coerce')
0    1.0
1    NaN
2    3.4
Name: a, dtype: float64

Upvotes: 13

Related Questions