mazjin
mazjin

Reputation: 90

Replacing NaNs in DataFrame not working due to non-float NaN values

I'm trying to write a DataFrame to an Excel file using xlsxwriter and am having some issues with NaN values not being filled properly.

Regardless of whether I use df.fillna, df.replace or the na_rep option on the writer, a few columns retain any NaN values in the DataFrame.

Investigating using pdb.set_trace(), I found the following:

(Pdb) df['col_name'][0]
Decimal('NaN')
(Pdb) Decimal(np.nan)
Decimal('NaN')
(Pdb) df['col_name'][0]==Decimal(np.nan)
False
(Pdb) na=df['col_name'][0]
(Pdb) na
Decimal('NaN')
(Pdb) na==df['col_name'][0]
False
(Pdb) df['col_name'][0]
Decimal('NaN')

How can I identify these values to replace them if I can't define them?

Upvotes: 1

Views: 2248

Answers (3)

Markus Semmler
Markus Semmler

Reputation: 11

Simply use the following function:

def pandas_replace_decimal_by_value(df, column, value):
    idx = df[column].apply(math.isnan)
    df.loc[idx, column] = value

Upvotes: 1

SergiyKolesnikov
SergiyKolesnikov

Reputation: 7815

You cannot use == to check for NaN. You should use np.isnan().

Upvotes: 1

jpp
jpp

Reputation: 164663

If Decimal format is not important to you, you can recast as float.

If Decimal format is important to you, then pandas isn't the ideal choice since it is not vectorizable like numpy arrays.

import pandas as pd, numpy as np
from decimal import Decimal

s = pd.Series([Decimal(np.nan), Decimal(5.1)])

test = s.isnull().tolist()  # [False, False]

s = s.astype(float)

test = s.isnull().tolist()  # [True, False]

Upvotes: 2

Related Questions