Reputation: 90
I'm trying to write a DataFrame to an Excel file using xlsxwriter and am having some issues with NaN values not being filled properly.
Regardless of whether I use df.fillna
, df.replace
or the na_rep
option on the writer, a few columns retain any NaN values in the DataFrame.
Investigating using pdb.set_trace()
, I found the following:
(Pdb) df['col_name'][0]
Decimal('NaN')
(Pdb) Decimal(np.nan)
Decimal('NaN')
(Pdb) df['col_name'][0]==Decimal(np.nan)
False
(Pdb) na=df['col_name'][0]
(Pdb) na
Decimal('NaN')
(Pdb) na==df['col_name'][0]
False
(Pdb) df['col_name'][0]
Decimal('NaN')
How can I identify these values to replace them if I can't define them?
Upvotes: 1
Views: 2248
Reputation: 11
Simply use the following function:
def pandas_replace_decimal_by_value(df, column, value):
idx = df[column].apply(math.isnan)
df.loc[idx, column] = value
Upvotes: 1
Reputation: 7815
You cannot use ==
to check for NaN
. You should use np.isnan()
.
Upvotes: 1
Reputation: 164663
If Decimal
format is not important to you, you can recast as float
.
If Decimal
format is important to you, then pandas
isn't the ideal choice since it is not vectorizable like numpy
arrays.
import pandas as pd, numpy as np
from decimal import Decimal
s = pd.Series([Decimal(np.nan), Decimal(5.1)])
test = s.isnull().tolist() # [False, False]
s = s.astype(float)
test = s.isnull().tolist() # [True, False]
Upvotes: 2