Reputation: 173

Pandas calculation giving incorrect decimal points

I have below pandas dataframe df:

|      clm1  |      clm2|
|     79.02  |     80.98|
|     78.55  |     81.47|
|     98.99  |    101.01|
|    999.54  |    999.55|
|    999.55  |    999.55|

I am performing below calculation on it:

df['avg'] = (df['clm1']+df['clm2'])/2

print(df)

| clm1   |    clm2   |  avg   |
|79.02   |    80.98  | 80.000 |
|78.55   |    81.47  | 80.010 |
|98.99   |   101.01  |100.000 |
|99.54   |   999.55  |999.545 |
|99.55   |   999.55  |999.550 |

When I am writing the above dataframe to csv I am getting incorrect result.

df.to_csv(myfile.csv)

clm1  , clm2  , avg
79.02 , 80.98 , 80.0
78.55 , 81.47 , 80.00999999999999  *# This should be 80.01*
98.99 , 101.01, 100.0
999.54, 999.55, 999.545
999.55, 999.55, 999.55

I understand the issues with floating point and i have gone through below Answers:

Python float - str - float weirdness Is floating point math broken? These suggest to use Decimal instead of float. But I am not able to find how to do that. Note: I do not want to use any rounding of method. I need the exact result.

Upvotes: 0

Answers (3)

user812142

Reputation: 173

I found the solution.

First convert the columns to string and then to Decimal. This is working perfectly fine and I am getting the correct result without rounding off.

def getAvg(x,y):    
    return ((x.apply(Decimal)+y.apply(Decimal))/Decimal(2)).apply(Decimal)

df['avg'] = getAvg(df['clm1'].astype('str'),df['clm2'].astype('str'))

Upvotes: 1

B. Bogart

Reputation: 1075

Here is an option that casts to int to avoid rounding. This works when taking the average of two columns.

#recreate data
import pandas as pd

df = pd.DataFrame([[79.02,80.98],
                   [78.55,81.47],
                   [98.99,101.01],
                   [999.54,999.55],
                   [999.55,999.55]], columns = ['clm1','clm2'])

#cast all values to integers
df = df.astype(int)
df['avg'] = ((df['clm1']+df['clm2'])/2).astype(int)

#return to floating point
df = (df/1000)
df.to_csv('pandasfile.csv')

The output of the above is:

,clm1,clm2,avg
0,79.02,80.98,80.0
1,78.55,81.47,80.01
2,98.99,101.01,100.0
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Another option: Using the Decimal class with pandas is another option, but tedious and slow if you have to cast a large number of floats as Decimals from your dataframe. Assuming you import the everyting as a Decimal the process would be as follows.

from decimal import Decimal

df = pd.DataFrame([[Decimal("79.02"),Decimal("80.98")],
                   [Decimal("78.55"),Decimal("81.47")],
                   [Decimal("98.99"),Decimal("101.01")],
                   [Decimal("999.54"),Decimal("999.55")],
                   [Decimal("999.55"),Decimal("999.55")]], columns = ['clm1','clm2'])

df['avg'] = (df['clm1']+df['clm2'])/2
df.to_csv('pandasfile.csv')

This gives the following in the csv file:

,clm1,clm2,avg
0,79.02,80.98,80.00
1,78.55,81.47,80.01
2,98.99,101.01,100.00
3,999.54,999.55,999.545
4,999.55,999.55,999.55

Original answer: You can use float_format argument on the to_csv method.

df['avg'] = (df['clm1']+df['clm2'])/2

Specify the number of decimals with float_format:

df.to_csv('pandasfile.csv', float_format='%.3f')

This writes the following to the csv file

,clm1,clm2,avg
0,79.020,80.980,80.000
1,78.550,81.470,80.010
2,98.990,101.010,100.000
3,999.540,999.550,999.545
4,999.550,999.550,999.550

Upvotes: 2

jsmart

Reputation: 3001

Here is a small example of using the Decimal class (though not with pandas):

from decimal import Decimal

xs = [Decimal("79.02"), Decimal("78.55"), Decimal("98.99"),
     Decimal("999.54"), Decimal("999.55")]

ys = [Decimal("80.98"), Decimal("81.47"), Decimal("101.01"), 
      Decimal("999.55"), Decimal("999.55")]

# conversion with str() is to align columns
for x, y in zip(xs, ys):
    print(f'{str(x):>8s} {str(y):>8s} {str((x + y) / 2):>8s}')

   79.02    80.98    80.00
   78.55    81.47    80.01
   98.99   101.01   100.00
  999.54   999.55  999.545
  999.55   999.55   999.55

The Python built-in decimal package has several rounding options; docs here
'What Every Computer Scientist Should Know About Floating-Point Arithmetic' has an accessible overview of the IEEE floating-point standard here

Upvotes: 0

Pandas calculation giving incorrect decimal points

Answers (3)

Related Questions