Reputation: 1955
df
A B
0 a=10 b=20.10
1 a=20 NaN
2 NaN b=30.10
3 a=40 b=40.10
I tried :
df['A'] = df['A'].str.extract('(\d+)').astype(int)
df['B'] = df['B'].str.extract('(\d+)').astype(float)
But I get the following error:
ValueError: cannot convert float NaN to integer
And:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
How do I fix this ?
Upvotes: 70
Views: 133952
Reputation: 12808
From pandas >= 0.24 there is now a built-in pandas integer.
This does allow integer nan's, so you don't need to fill na's.
Notice the capital in 'Int64'
in the code below.
This is the pandas integer, instead of the numpy integer.
You need to use: .astype('Int64')
So, do this:
df['A'] = df['A'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
df['B'] = df['B'].str.extract('(\d+)', expand=False).astype('float').astype('Int64')
More info on pandas integer na values:
https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions
Upvotes: 37
Reputation: 862641
If some values in column are missing (NaN
) and then converted to numeric, always dtype
is float
. You cannot convert values to int
. Only to float
, because type
of NaN
is float
.
print (type(np.nan))
<class 'float'>
See docs how convert values if at least one NaN
:
integer > cast to float64
If need int values you need replace NaN
to some int
, e.g. 0
by fillna
and then it works perfectly:
df['A'] = df['A'].str.extract('(\d+)', expand=False)
df['B'] = df['B'].str.extract('(\d+)', expand=False)
print (df)
A B
0 10 20
1 20 NaN
2 NaN 30
3 40 40
df1 = df.fillna(0).astype(int)
print (df1)
A B
0 10 20
1 20 0
2 0 30
3 40 40
print (df1.dtypes)
A int32
B int32
dtype: object
Upvotes: 81