Reputation: 433

pandas.DataFrame.round doesn't seem to work on my DataFrames - Rounding issue causes extra data stored in csv files

I've stumbled upon a small issue when using pandas DataFrame:

I have a big csv file (around 2Gb of data) containing the price of an asset and created using the DataFrame.to_csv() function of Pandas, and when I take a closer inspection of the code, my first lines look like this:

DateTime,open,high,low,close
2016-01-04 00:36:18,1.08505,1.08505,1.08504,1.08504
2016-01-04 00:36:19,1.08505,1.08505,1.08504,1.08504
2016-01-04 00:36:20,1.08503,1.08503,1.08495,1.08495
2016-01-04 00:36:21,1.0849600000000001,1.0849600000000001,1.0849600000000001,1.0849600000000001
2016-01-04 00:36:22,1.0849600000000001,1.0849600000000001,1.08492,1.08492

The data was created using the df.resample('1s').ohlc() and I thought that sometimes there was a few rounding issue, so I tried to round the DataFrame using df.round(5)to keep the last 5 decimals, but it doesn't change anything at all.

SEC = pd.read_csv("D:\Finance python\Data\EUR_USD\Sec\S1_2015.csv",index_col='DateTime',parse_dates=True,error_bad_lines=False,infer_datetime_format=True)
SEC = SEC.round(5)

The DataFrame stays the same, and I truly wonder why.

When I try it with a csv file containing the 5 rows I gave above:

In[13]: SEC["open"][3]
Out[13]: 1.0849599999999999

It's not an issue when doing calculation over the df (even though it might be faster the less decimals there is), but it seems like a lot of 0 or 9 are being stored in my csv files for nothing, and are taking extra space.

It also seems that even value that look fine in the csv file, are actually not well rounded when called with pandas.

Would anyone have an idea of why the DataFrame are not being rounded properly, or of a solution to have shorter csv files when I save them with pandas?

Thanks in advance

Edit: I tried to use the Decimal method, but it still doesn't work. I believe that it is because pandas is not able to store Decimal type numbers in dataframes, thus converting it to a float.

Upvotes: 2

Answers (3)

Erlinska

Reputation: 433

I found what the issue was on another post: float64 with pandas to_csv

I need to use the argument float_format='%.5fto have my csv file as I want them, the issue is linked to the way float number work.

Upvotes: 3

Ken Wei

Reputation: 3130

This has to do with the precision of floating point arithmetic; not all numbers can be represented exactly. If you want to set pandas to display numbers to 5 decimal places, you can do

pd.set_options('display.float_format','{:.5f}')

but the internal representation will stay the same (which after reading your post closely, will not solve your issues with the csv).

If you want to change the internal representation, you need to use a Decimal data type:

from decimal import Decimal
df.round(5).astype(Decimal)

FYI, you can reproduce your problem like this:

import numpy as np
np.float64(1.08496)

Upvotes: 3

ileadall42

Reputation: 651

I try to run your code:

df = pd.read_clipboard(sep=',',engine='python')
df

DateTime    open    high    low close
0   2016-01-04 00:36:18 1.08505 1.08505 1.08504 1.08504
1   2016-01-04 00:36:19 1.08505 1.08505 1.08504 1.08504
2   2016-01-04 00:36:20 1.08503 1.08503 1.08495 1.08495

and then use df.round(3)

    DateTime    open    high    low close
0   2016-01-04 00:36:18 1.085   1.085   1.085   1.085
1   2016-01-04 00:36:19 1.085   1.085   1.085   1.085
2   2016-01-04 00:36:20 1.085   1.085   1.085   1.085

It works for me ,but as I df.astype(str).round(3) and then it will not work .So I recommend you to check the type of your data.

Upvotes: 0

pandas.DataFrame.round doesn&#39;t seem to work on my DataFrames - Rounding issue causes extra data stored in csv files

Answers (3)

Related Questions

pandas.DataFrame.round doesn't seem to work on my DataFrames - Rounding issue causes extra data stored in csv files