Reputation: 173
I have below pandas dataframe df:
| clm1 | clm2|
| 79.02 | 80.98|
| 78.55 | 81.47|
| 98.99 | 101.01|
| 999.54 | 999.55|
| 999.55 | 999.55|
I am performing below calculation on it:
df['avg'] = (df['clm1']+df['clm2'])/2
print(df)
| clm1 | clm2 | avg |
|79.02 | 80.98 | 80.000 |
|78.55 | 81.47 | 80.010 |
|98.99 | 101.01 |100.000 |
|99.54 | 999.55 |999.545 |
|99.55 | 999.55 |999.550 |
When I am writing the above dataframe to csv I am getting incorrect result.
df.to_csv(myfile.csv)
clm1 , clm2 , avg
79.02 , 80.98 , 80.0
78.55 , 81.47 , 80.00999999999999 *# This should be 80.01*
98.99 , 101.01, 100.0
999.54, 999.55, 999.545
999.55, 999.55, 999.55
I understand the issues with floating point and i have gone through below Answers:
Python float - str - float weirdness Is floating point math broken? These suggest to use Decimal instead of float. But I am not able to find how to do that. Note: I do not want to use any rounding of method. I need the exact result.
Upvotes: 0
Views: 1352
Reputation: 173
I found the solution.
First convert the columns to string and then to Decimal. This is working perfectly fine and I am getting the correct result without rounding off.
def getAvg(x,y):
return ((x.apply(Decimal)+y.apply(Decimal))/Decimal(2)).apply(Decimal)
df['avg'] = getAvg(df['clm1'].astype('str'),df['clm2'].astype('str'))
Upvotes: 1
Reputation: 1075
Here is an option that casts to int to avoid rounding. This works when taking the average of two columns.
#recreate data
import pandas as pd
df = pd.DataFrame([[79.02,80.98],
[78.55,81.47],
[98.99,101.01],
[999.54,999.55],
[999.55,999.55]], columns = ['clm1','clm2'])
#cast all values to integers
df = df.astype(int)
df['avg'] = ((df['clm1']+df['clm2'])/2).astype(int)
#return to floating point
df = (df/1000)
df.to_csv('pandasfile.csv')
The output of the above is:
,clm1,clm2,avg
0,79.02,80.98,80.0
1,78.55,81.47,80.01
2,98.99,101.01,100.0
3,999.54,999.55,999.545
4,999.55,999.55,999.55
Another option: Using the Decimal class with pandas is another option, but tedious and slow if you have to cast a large number of floats as Decimals from your dataframe. Assuming you import the everyting as a Decimal the process would be as follows.
from decimal import Decimal
df = pd.DataFrame([[Decimal("79.02"),Decimal("80.98")],
[Decimal("78.55"),Decimal("81.47")],
[Decimal("98.99"),Decimal("101.01")],
[Decimal("999.54"),Decimal("999.55")],
[Decimal("999.55"),Decimal("999.55")]], columns = ['clm1','clm2'])
df['avg'] = (df['clm1']+df['clm2'])/2
df.to_csv('pandasfile.csv')
This gives the following in the csv file:
,clm1,clm2,avg
0,79.02,80.98,80.00
1,78.55,81.47,80.01
2,98.99,101.01,100.00
3,999.54,999.55,999.545
4,999.55,999.55,999.55
Original answer: You can use float_format argument on the to_csv method.
df['avg'] = (df['clm1']+df['clm2'])/2
Specify the number of decimals with float_format:
df.to_csv('pandasfile.csv', float_format='%.3f')
This writes the following to the csv file
,clm1,clm2,avg
0,79.020,80.980,80.000
1,78.550,81.470,80.010
2,98.990,101.010,100.000
3,999.540,999.550,999.545
4,999.550,999.550,999.550
Upvotes: 2
Reputation: 3001
Here is a small example of using the Decimal class (though not with pandas):
from decimal import Decimal
xs = [Decimal("79.02"), Decimal("78.55"), Decimal("98.99"),
Decimal("999.54"), Decimal("999.55")]
ys = [Decimal("80.98"), Decimal("81.47"), Decimal("101.01"),
Decimal("999.55"), Decimal("999.55")]
# conversion with str() is to align columns
for x, y in zip(xs, ys):
print(f'{str(x):>8s} {str(y):>8s} {str((x + y) / 2):>8s}')
79.02 80.98 80.00
78.55 81.47 80.01
98.99 101.01 100.00
999.54 999.55 999.545
999.55 999.55 999.55
decimal
package has several rounding options; docs hereUpvotes: 0