Reputation: 73
I have a column in a pandas data frame that looks like this (much longer but here's the top few rows):
>df_fill['col1']
0 5987.8866699999998672865
1 52215.5966699999989941716
2 201.8966700000000003001
3 3.8199999999999998401
I want to round the entire column to 5 decimal places. I can round it to integers, but not to any amount of digits after the decimal. The type for the column is float.
> np.around(df_fill['col1'], 0)
0 5988
1 52216
2 202
3 4
> np.around(df_fill['col1'], 5)
0 5987.8866699999998672865
1 52215.5966699999989941716
2 201.8966700000000003001
3 3.8199999999999998401
> (df_fill['col1']).round()
0 5988
1 52216
2 202
3 4
>(df_fill['col1']).round(5)
0 5987.8866699999998672865
1 52215.5966699999989941716
2 201.8966700000000003001
3 3.8199999999999998401
> (df_fill['col1']).round(decimals=5)
0 5987.8866699999998672865
1 52215.5966699999989941716
2 201.8966700000000003001
3 3.8199999999999998401
> str((df_fill['col1']).round(decimals=5))
'0 5987.8866699999998672865\n1 52215.5966699999989941716\n2 201.8966700000000003001\n3 3.8199999999999998401\
What am I missing here?
Upvotes: 2
Views: 5127
Reputation: 881037
Floats can only represent a subset of the real numbers. It can only exactly represent those decimals which are sums of negative powers of two ("binary fractions"). After you round a float to 5 digits, the new float may not be the real number which has 5 decimal digits since the decimal part may not be exactly expressible as a binary fraction. Instead rounding returns the float closest to that real number.
If you have set
pd.options.display.float_format = '{:.23g}'.format
then Pandas will show up to 23 digits in its string representation of floats:
import pandas as pd
pd.options.display.float_format = '{:.23g}'.format
df_fill = pd.DataFrame({'col1':[ 5987.8866699999998672865, 52215.5966699999989941716,
201.8966700000000003001, 3.8199999999999998401]})
# col1
# 0 5987.8866699999998672865
# 1 52215.596669999998994172
# 2 201.89667000000000030013
# 3 3.8199999999999998401279
print(df_fill['col1'].round(5))
# 0 5987.8866699999998672865
# 1 52215.596669999998994172
# 2 201.89667000000000030013
# 3 3.8199999999999998401279
# Name: col1, dtype: float64
But if you set the float_format to display 5 decimal digits:
pd.options.display.float_format = '{:.5f}'.format
then
print(df_fill['col1'].round(5))
yields
0 5987.88667
1 52215.59667
2 201.89667
3 3.82000
Name: col1, dtype: float64
Note the underlying float has not changed; only the manner in which it is displayed.
Upvotes: 5
Reputation: 1088
Your problem is due to a precision issue in representing floating point numbers. The number 5987.88667 cannot be represented exactly in a float, the nearest number that can be represented is 5987.8866699999998672865. Thus you already have the number closest to the number you want in the array, and rounding it to 5 decimal places will thus have no effect. You already have the correct invocation:
(df_fill['col1']).round(5)
You can see that it works if you try to round to 2 decimal places instead. So I suggest you don't worry about it. If the issue is how the number is displayed on the screen, then you can print the number to a string to the correct number of decimal places:
print "%.5f"%(df_fill['col1'])
Upvotes: 1