Winter
Winter

Reputation: 87

Replace NaN value in table from dataframe without changed other values

I created pivot table from dataframe used below code

table = pd.pivot_table(df_table, values=['KPI Amount Convert to USD'], index=['Customer Nick', 'Customer',
                                                                             'Customer Name', 'BSO Name', 'BSO Comment',
                                                                            'Pay Date, Recovery action, No pay schedule reason '],
                       columns=['Range'], aggfunc={'KPI Amount Convert to USD': np.sum}, margins=True,
                       margins_name='Grand Total')

it works great, but some value are NaN (other are regular number)

when I used

table = table.replace(np.nan, '', regex=True)

the NaN value are empty, BUT some value are changed to 3.0176e+06, even before it was 3017601.99.

Do you have any idea how to replace NaN value, but preserve int format of other?

thanks for your advice

Upvotes: 2

Views: 1373

Answers (4)

Serge Ballesta
Serge Ballesta

Reputation: 148965

The problem here is the dtype (data type) of the column, or more exactly on the underlying numpy array. I assume that in your table dataframe, the column containing NaN values have a floating point type (float64).

If you replaced NaN with 0., all would be fine, but if you want to write an empty string there, Pandas change the dtype to object.

BTW, 3.0176e+06 is just a different representation of 3017601.99 but I would assume that the value has not changed. Simply pandas uses different representation for np.float64 type columns and object type columns.

You can ask it to use the default str conversion for float values in object columns by setting the relevant option: pd.set_option('display.float_format', str)

Demo:

>>> pd.set_option('display.float_format', None)                # reset option
>>> df = pd.DataFrame([[3017601.99], [np.nan]], columns=['A'])
>>> df
            A
0  3017601.99
1         NaN
>>> df1 = df.fillna('')
>>> df1
            A
0  3.0176e+06
1            
>>> pd.set_option('display.float_format', str)      # set the option
>>> df1
           A
0 3017601.99
1           
>>> df.loc[0,'A'] == df1.loc[0,'A']
True

Upvotes: 3

Jim Eisenberg
Jim Eisenberg

Reputation: 1500

It's an issue of formatting - basically, when a column in in a given type, the numbers are shown in a certain way.

If your column has only floats (so numbers and np.nan fit in that), it will display things one way.

If your column has floats and strings (numbers and '') then the column dtype is set to "object" and it displays various things differently, such as large floats/ints

This is why df.fillna(0) works (0 is also a float so dtype remains float) but df.fillna('') creates the same display change.

The actual value does not change, eg:

df.loc[2,0]
> 3017601.990
df.fillna('').loc[2,0] == df.fillna(0).loc[2,0]
> True

Upvotes: 0

alchemist_
alchemist_

Reputation: 90

  table = table.fillna('-')

or

  table = table.fillna(0)

Upvotes: 0

hereismunna
hereismunna

Reputation: 1

Did you try table = table.fillna('')

Upvotes: 0

Related Questions