Upgrading pandas causing output variations

Question

I'm upgrading my pandas library version from 0.25.* to 1.2.2(latest version).

During the below code the output values are getting changed.

ffilled = df.groupby('masterpatientid')[vtl_cols].fillna(method='ffill')
ffilled['masterpatientid'] = df.masterpatientid

The values are changing from -

14.142135999999999 to 14.142136
50.381046000000005 to 50.381046
1.7320508000000001 to 1.7320508

Although the changes are minimal, I really want to know the reason behind them. I have read the docs and the latest changes made in pandas but couldn't arrive at any conclusion.

Would appreciate it if someone could help.

anon01 · Accepted Answer

It is likely that there is no difference.

Pandas can display fewer digits than it stores for convenience. You can use pd.options.display.precision = 16 to set the display to full double precision.

That being said, floating point discrepancies are trivial to generate:

q1 = (0.1 + 0.2) + 0.3
q2 = 0.1 + (0.2 + 0.3)
q2 - q1 # -1.1102230246251565e-16

Something similar could have been changed in the pandas source code, or the numerical subroutine packages it relies on. For numerical simulation and other domains of scientific computing, this can be a huge problem; for most applications however, double precision is more than enough and it's really not worth worrying about.

Upgrading pandas causing output variations

Answers (1)

Related Questions