Pranav Shah
Pranav Shah

Reputation: 3323

Comparing values of all columns except one

Shown below is the code that compares column values to a constant.

My questions:

  1. Why does the ">=" comparison show "False" for 0.005000 for row "a". I expect it to be true.
  2. Is it possible to repeat the comparison for all columns except the first and "AND" the results

Sorry could not format the code properly.

import numpy as np  
import pandas as pd

def test_pct_change():  

    MIN_CHANGE = 0.0050 #.5%  For some reason 0.0050 does not work in comparison

    data = { 'c1' : pd.Series([100, 110], index=['a', 'b']),
          'c2' : pd.Series([100.5, 105, 3.,], index=['a', 'b', 'c']),
          'c3' : pd.Series([102, 100, 3.], index=['a', 'b', 'c'])}

    df = pd.DataFrame(data)

    print df.to_string()

    dft_pct = df.pct_change(axis=1) #1: columns
    dft_pct['Has_Min_Change'] = (dft_pct.iloc[:, -2] >= MIN_CHANGE) #(dft_pct.iloc[:, -1] >= MIN_CHANGE) & 
    print 'Percent Change'
    print dft_pct.to_string()

Upvotes: 1

Views: 222

Answers (2)

piRSquared
piRSquared

Reputation: 294328

This is why numpy has isclose

Consider the dataframe df

df = pd.DataFrame(np.random.rand(5, 5))
print(df)

          0         1         2         3         4
0  0.362368  0.201145  0.340571  0.733402  0.816436
1  0.216386  0.105877  0.565318  0.102514  0.451794
2  0.221733  0.216303  0.039209  0.482731  0.800290
3  0.200427  0.154020  0.612884  0.695920  0.122780
4  0.986003  0.059244  0.291480  0.270779  0.526996

Evaluate an equality we know to be mathematically true

((100 + df) / 100 - 1) == (df / 100)

       0      1      2      3      4
0  False  False  False  False  False
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False  False  False  False
4  False  False  False  False  False

Let's look at the difference.
We can round to 15 decimal places and it still comes back all zeros.
These are really close.

print(((100 + df) / 100 - 1).sub(df / 100).round(15))

     0    1    2    3    4
0 -0.0  0.0  0.0  0.0  0.0
1 -0.0  0.0  0.0  0.0  0.0
2 -0.0 -0.0  0.0 -0.0 -0.0
3 -0.0  0.0  0.0 -0.0  0.0
4  0.0 -0.0 -0.0  0.0  0.0

This is why numpy has isclose

np.isclose(((100 + df) / 100 - 1), (df / 100))

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

This is the consequence of using binary gates perform decimal math, and we have a work around.

Upvotes: 2

user6655984
user6655984

Reputation:

When computations are performed in double precision, 100.5/100 is slightly less than 0.005, and so (100.5/100-1) >= 0.005 evaluates to False. This is why you don't get "Min Change" for the change from 100 to 100.5

If it's really important that such edge cases be included, you can fudge it slightly with the inequality like >= MIN_CHANGE - 1e-15.

One way to represent the condition that all columns satisfy >= MIN_CHANGE is to take minimum over columns, and require that to be >= MIN_CHANGE. Example:

dft_pct['Has_Min_Change'] = dft_pct.min(axis=1) >= MIN_CHANGE

By default, min ignores NaN entries. (Watch out for implicit conversion of Booleans to ints, however: False is treated by it as 0).

Upvotes: 0

Related Questions