Sun
Sun

Reputation: 1955

error using astype when NaN exists in a dataframe

df
     A     B  
0   a=10   b=20.10
1   a=20   NaN
2   NaN    b=30.10
3   a=40   b=40.10

I tried :

df['A'] = df['A'].str.extract('(\d+)').astype(int)
df['B'] = df['B'].str.extract('(\d+)').astype(float)

But I get the following error:

ValueError: cannot convert float NaN to integer

And:

AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

How do I fix this ?

Upvotes: 70

Views: 133952

Answers (2)

Sander van den Oord
Sander van den Oord

Reputation: 12808

From pandas >= 0.24 there is now a built-in pandas integer.
This does allow integer nan's, so you don't need to fill na's.
Notice the capital in 'Int64' in the code below.
This is the pandas integer, instead of the numpy integer.

You need to use: .astype('Int64')

So, do this:

df['A'] = df['A'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
df['B'] = df['B'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')

More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions

Upvotes: 37

jezrael
jezrael

Reputation: 862641

If some values in column are missing (NaN) and then converted to numeric, always dtype is float. You cannot convert values to int. Only to float, because type of NaN is float.

print (type(np.nan))
<class 'float'>

See docs how convert values if at least one NaN:

integer > cast to float64

If need int values you need replace NaN to some int, e.g. 0 by fillna and then it works perfectly:

df['A'] = df['A'].str.extract('(\d+)', expand=False)
df['B'] = df['B'].str.extract('(\d+)', expand=False)
print (df)
     A    B
0   10   20
1   20  NaN
2  NaN   30
3   40   40

df1 = df.fillna(0).astype(int)
print (df1)
    A   B
0  10  20
1  20   0
2   0  30
3  40  40

print (df1.dtypes)
A    int32
B    int32
dtype: object

Upvotes: 81

Related Questions