Reputation: 2698
Some columns in my DataFrame have instances of <NA>
which are of type pandas._libs.missing.NAType
.
I'd like to replace them with NaN
using np.nan
.
I have seen questions where the instances of <NA>
can be replaced when using pd.read_csv()
.
But since my Pandas DataFrame is created from a Spark DataFrame I do not use the pd.read_csv()
function.
Please Advise.
Upvotes: 9
Views: 9601
Reputation: 1
replaсe
didn't work for me either, but following code worked for me:
df['my_col'].map({pd.NA: np.nan})
Upvotes: 0
Reputation: 9257
Using Pandas v1.3.1
and Numpy v1.20.3
you can use df.where()
which do the replace when the condition is False
like below:
$> df = pd.DataFrame({'age':[pd.NA, 4, 8]})
$> print(df)
age
0 <NA>
1 4
2 8
$> print(type(df.iloc[0]['age']))
pandas._libs.missing.NAType
$> df = df.where(pd.notnull(df), np.nan) # Replace pd.NA, np.nan and None by np.nan
$> print(df)
age
0 NaN
1 4
2 8
$> print(type(df.iloc[0]['age']))
float
PS: You do also:
$> df = df.where(~pd.isna(df), np.nan)
Upvotes: 1
Reputation: 398
I didn't have any luck with the replace
solution but was able to convert <NA>
to np.nan by converting the column to float - df['my_col'].astype(float)
.
Upvotes: 4
Reputation: 862611
Use replace
, but also is necessary upgrade pandas.
df = pd.DataFrame({'age':[pd.NA, 4, 8]})
df = df.replace(pd.NA, np.nan)
print (df)
age
0 NaN
1 4.0
2 8.0
Upvotes: 0