lilyrobin
lilyrobin

Reputation: 73

Round pandas data frame/series

I have a column in a pandas data frame that looks like this (much longer but here's the top few rows):

>df_fill['col1']

0      5987.8866699999998672865
1     52215.5966699999989941716
2       201.8966700000000003001
3         3.8199999999999998401

I want to round the entire column to 5 decimal places. I can round it to integers, but not to any amount of digits after the decimal. The type for the column is float.

> np.around(df_fill['col1'], 0)

0      5988
1     52216
2       202
3         4

> np.around(df_fill['col1'], 5)

0      5987.8866699999998672865
1     52215.5966699999989941716
2       201.8966700000000003001
3         3.8199999999999998401

> (df_fill['col1']).round()

0      5988
1     52216
2       202
3         4

>(df_fill['col1']).round(5)

0      5987.8866699999998672865
1     52215.5966699999989941716
2       201.8966700000000003001
3         3.8199999999999998401

> (df_fill['col1']).round(decimals=5)

0      5987.8866699999998672865
1     52215.5966699999989941716
2       201.8966700000000003001
3         3.8199999999999998401

> str((df_fill['col1']).round(decimals=5))
'0      5987.8866699999998672865\n1     52215.5966699999989941716\n2       201.8966700000000003001\n3         3.8199999999999998401\

What am I missing here?

Upvotes: 2

Views: 5127

Answers (2)

unutbu
unutbu

Reputation: 881037

Floats can only represent a subset of the real numbers. It can only exactly represent those decimals which are sums of negative powers of two ("binary fractions"). After you round a float to 5 digits, the new float may not be the real number which has 5 decimal digits since the decimal part may not be exactly expressible as a binary fraction. Instead rounding returns the float closest to that real number.

If you have set

pd.options.display.float_format = '{:.23g}'.format

then Pandas will show up to 23 digits in its string representation of floats:

import pandas as pd

pd.options.display.float_format = '{:.23g}'.format

df_fill = pd.DataFrame({'col1':[ 5987.8866699999998672865, 52215.5966699999989941716, 
                                201.8966700000000003001, 3.8199999999999998401]})

#                       col1
# 0 5987.8866699999998672865
# 1 52215.596669999998994172
# 2 201.89667000000000030013
# 3 3.8199999999999998401279

print(df_fill['col1'].round(5))
# 0   5987.8866699999998672865
# 1   52215.596669999998994172
# 2   201.89667000000000030013
# 3   3.8199999999999998401279
# Name: col1, dtype: float64

But if you set the float_format to display 5 decimal digits:

pd.options.display.float_format = '{:.5f}'.format

then

print(df_fill['col1'].round(5))

yields

0    5987.88667
1   52215.59667
2     201.89667
3       3.82000
Name: col1, dtype: float64

Note the underlying float has not changed; only the manner in which it is displayed.

Upvotes: 5

Conor
Conor

Reputation: 1088

Your problem is due to a precision issue in representing floating point numbers. The number 5987.88667 cannot be represented exactly in a float, the nearest number that can be represented is 5987.8866699999998672865. Thus you already have the number closest to the number you want in the array, and rounding it to 5 decimal places will thus have no effect. You already have the correct invocation:

(df_fill['col1']).round(5)

You can see that it works if you try to round to 2 decimal places instead. So I suggest you don't worry about it. If the issue is how the number is displayed on the screen, then you can print the number to a string to the correct number of decimal places:

print "%.5f"%(df_fill['col1'])

Upvotes: 1

Related Questions