Reputation: 3323
Shown below is the code that compares column values to a constant.
My questions:
Sorry could not format the code properly.
import numpy as np
import pandas as pd
def test_pct_change():
MIN_CHANGE = 0.0050 #.5% For some reason 0.0050 does not work in comparison
data = { 'c1' : pd.Series([100, 110], index=['a', 'b']),
'c2' : pd.Series([100.5, 105, 3.,], index=['a', 'b', 'c']),
'c3' : pd.Series([102, 100, 3.], index=['a', 'b', 'c'])}
df = pd.DataFrame(data)
print df.to_string()
dft_pct = df.pct_change(axis=1) #1: columns
dft_pct['Has_Min_Change'] = (dft_pct.iloc[:, -2] >= MIN_CHANGE) #(dft_pct.iloc[:, -1] >= MIN_CHANGE) &
print 'Percent Change'
print dft_pct.to_string()
Upvotes: 1
Views: 222
Reputation: 294328
This is why numpy
has isclose
Consider the dataframe df
df = pd.DataFrame(np.random.rand(5, 5))
print(df)
0 1 2 3 4
0 0.362368 0.201145 0.340571 0.733402 0.816436
1 0.216386 0.105877 0.565318 0.102514 0.451794
2 0.221733 0.216303 0.039209 0.482731 0.800290
3 0.200427 0.154020 0.612884 0.695920 0.122780
4 0.986003 0.059244 0.291480 0.270779 0.526996
Evaluate an equality we know to be mathematically true
((100 + df) / 100 - 1) == (df / 100)
0 1 2 3 4
0 False False False False False
1 False False False False False
2 False False False False False
3 False False False False False
4 False False False False False
Let's look at the difference.
We can round to 15 decimal places and it still comes back all zeros.
These are really close.
print(((100 + df) / 100 - 1).sub(df / 100).round(15))
0 1 2 3 4
0 -0.0 0.0 0.0 0.0 0.0
1 -0.0 0.0 0.0 0.0 0.0
2 -0.0 -0.0 0.0 -0.0 -0.0
3 -0.0 0.0 0.0 -0.0 0.0
4 0.0 -0.0 -0.0 0.0 0.0
This is why numpy
has isclose
np.isclose(((100 + df) / 100 - 1), (df / 100))
array([[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]], dtype=bool)
This is the consequence of using binary gates perform decimal math, and we have a work around.
Upvotes: 2
Reputation:
When computations are performed in double precision, 100.5/100 is slightly less than 0.005, and so (100.5/100-1) >= 0.005
evaluates to False. This is why you don't get "Min Change" for the change from 100 to 100.5
If it's really important that such edge cases be included, you can fudge it slightly with the inequality like >= MIN_CHANGE - 1e-15
.
One way to represent the condition that all columns satisfy >= MIN_CHANGE is to take minimum over columns, and require that to be >= MIN_CHANGE. Example:
dft_pct['Has_Min_Change'] = dft_pct.min(axis=1) >= MIN_CHANGE
By default, min
ignores NaN entries. (Watch out for implicit conversion of Booleans to ints, however: False is treated by it as 0).
Upvotes: 0