The Singularity
The Singularity

Reputation: 2698

How do I replace <NA> with NaN in a DataFrame?

Some columns in my DataFrame have instances of <NA> which are of type pandas._libs.missing.NAType.

I'd like to replace them with NaN using np.nan.

I have seen questions where the instances of <NA> can be replaced when using pd.read_csv().

But since my Pandas DataFrame is created from a Spark DataFrame I do not use the pd.read_csv() function.

Please Advise.

Upvotes: 9

Views: 9601

Answers (4)

Eduard
Eduard

Reputation: 1

replaсe didn't work for me either, but following code worked for me:

df['my_col'].map({pd.NA: np.nan})

Upvotes: 0

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

Using Pandas v1.3.1 and Numpy v1.20.3 you can use df.where() which do the replace when the condition is False like below:

$> df = pd.DataFrame({'age':[pd.NA, 4, 8]})
$> print(df)
    age
0  <NA>
1     4
2     8
$> print(type(df.iloc[0]['age']))
   pandas._libs.missing.NAType
$> df = df.where(pd.notnull(df), np.nan)  # Replace pd.NA, np.nan and None by np.nan
$> print(df)
   age
0  NaN
1    4
2    8
$> print(type(df.iloc[0]['age']))
   float

PS: You do also:

$> df = df.where(~pd.isna(df), np.nan)

Upvotes: 1

piedpiper
piedpiper

Reputation: 398

I didn't have any luck with the replace solution but was able to convert <NA> to np.nan by converting the column to float - df['my_col'].astype(float).

Upvotes: 4

jezrael
jezrael

Reputation: 862611

Use replace, but also is necessary upgrade pandas.

df = pd.DataFrame({'age':[pd.NA, 4, 8]})

df = df.replace(pd.NA, np.nan)
print (df)
   age
0  NaN
1  4.0
2  8.0

Upvotes: 0

Related Questions