BlueQuant
BlueQuant

Reputation: 345

How to perform single operation on Multiple columns of Dataframe

I have the following dataframe:

df
>>>                                     TSLA             MSFT
2017-05-15 00:00:00+00:00                320              68
2017-05-16 00:00:00+00:00                319              69
2017-05-17 00:00:00+00:00                314              61
2017-05-18 00:00:00+00:00                313              66
2017-05-19 00:00:00+00:00                316              62
2017-05-22 00:00:00+00:00                314              65
2017-05-23 00:00:00+00:00                310              63


max_idx = df.idxmax() # returns index of max value
>>> TSLA    2017-05-15 00:00:00+00:00
>>> MSFT    2017-05-16 00:00:00+00:00

max_value = df.max() # returns max value
>>> TSLA = 320
>>> MSFT = 69

def pct_change(first, second):  # pct chg formula
    return (second-first) / first*100.00

I want to get percent change between max_value and with each consecutive value starting from max_idx (df.loc[max_idx:]) for both columns. Just to ensure that, the percent change is not below 5%.

Example: 
for TSLA:  320 with 319 = 2%       for MSFT: 69 with 61 = 4%
           320 with 314 = 4%                 69 with 66 = 5% 
           320 with 313 = 5%                 69 with 62 = 10%

Edit: If you find it difficult to answer, i can be happy with just a reference to what type of function or method i shall use for such operations.

Note: I just want to ensure that percent change isn't below 5%.

Upvotes: 0

Views: 778

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

I am not sure about your true/false conditions, but I think you need something like this, thanks to @JohnGalt:

df.apply(lambda x: ((1 - x/x.max()) > 0.05).all())

Or using your logic:

df.apply(lambda x: ((x[x.idxmax()]-x)/x[x.idxmax()]*100>5).all())

Output:

TSLA    False
MSFT    False
dtype: bool

Let's look at one column,

John's formula:

1 - df.TSLA/df.TSLA.max()

Returns:

2017-05-15 00:00:00+00:00    0.000000
2017-05-16 00:00:00+00:00    0.003125
2017-05-17 00:00:00+00:00    0.018750
2017-05-18 00:00:00+00:00    0.021875
2017-05-19 00:00:00+00:00    0.012500
2017-05-22 00:00:00+00:00    0.018750
2017-05-23 00:00:00+00:00    0.031250
Name: TSLA, dtype: float64

If all of those values are greater than 5 return True, else return False.

My original formula works also, just a bit more calculation to do the same thing that John formula does. Lastly, use lambda function to apply this formula to each column independently.

Upvotes: 3

Related Questions