Reputation: 87
I created pivot table from dataframe used below code
table = pd.pivot_table(df_table, values=['KPI Amount Convert to USD'], index=['Customer Nick', 'Customer',
'Customer Name', 'BSO Name', 'BSO Comment',
'Pay Date, Recovery action, No pay schedule reason '],
columns=['Range'], aggfunc={'KPI Amount Convert to USD': np.sum}, margins=True,
margins_name='Grand Total')
it works great, but some value are NaN (other are regular number)
when I used
table = table.replace(np.nan, '', regex=True)
the NaN value are empty, BUT some value are changed to 3.0176e+06, even before it was 3017601.99.
Do you have any idea how to replace NaN value, but preserve int format of other?
thanks for your advice
Upvotes: 2
Views: 1373
Reputation: 148965
The problem here is the dtype
(data type) of the column, or more exactly on the underlying numpy array. I assume that in your table
dataframe, the column containing NaN values have a floating point type (float64).
If you replaced NaN with 0., all would be fine, but if you want to write an empty string there, Pandas change the dtype
to object
.
BTW, 3.0176e+06 is just a different representation of 3017601.99 but I would assume that the value has not changed. Simply pandas uses different representation for np.float64
type columns and object
type columns.
You can ask it to use the default str
conversion for float values in object
columns by setting the relevant option: pd.set_option('display.float_format', str)
Demo:
>>> pd.set_option('display.float_format', None) # reset option
>>> df = pd.DataFrame([[3017601.99], [np.nan]], columns=['A'])
>>> df
A
0 3017601.99
1 NaN
>>> df1 = df.fillna('')
>>> df1
A
0 3.0176e+06
1
>>> pd.set_option('display.float_format', str) # set the option
>>> df1
A
0 3017601.99
1
>>> df.loc[0,'A'] == df1.loc[0,'A']
True
Upvotes: 3
Reputation: 1500
It's an issue of formatting - basically, when a column in in a given type, the numbers are shown in a certain way.
If your column has only floats (so numbers and np.nan fit in that), it will display things one way.
If your column has floats and strings (numbers and '') then the column dtype is set to "object" and it displays various things differently, such as large floats/ints
This is why df.fillna(0) works (0 is also a float so dtype remains float) but df.fillna('') creates the same display change.
The actual value does not change, eg:
df.loc[2,0]
> 3017601.990
df.fillna('').loc[2,0] == df.fillna(0).loc[2,0]
> True
Upvotes: 0