find the minimum value of a column between two occurrences of a value in another column in a Python data frame

Question

I have stock price data containing Open, High, Low,Close prices on a daily basis. I am creating a new column "signal", which will take the values "signal" or "none" based on some conditions.

Every time df['signal']=="signal",we have to compare it with the previous 3 occurrences of df['signal']=="signal". let us imagine the current occurrence to be the 4th signal. So, the previous occurrence of df['signal']=="signal" would be the 3rd signal, the even previous would the 2nd signal and the signal previous to that would be the first signal.

I need to check if the minimum value of df['low']between signal 4 and signal 3 is GREATER THAN the minimum value of df['low'] between signal 1 and signal 2.

If it is greater, I need a new column df['trade']=="Buy".

Sample data

No Open High Low Close signal Trade 

1   75   95   65  50    signal
2   78   94   74  77    none
3   83   91   81  84    none
4   91   101  88  93    signal
5   104  121  95  103   none
6   101  111  99  105   none
7   97   108  95  101   signal
8   103  113  102 106   none
9   108  128  105 114   signal  BUY
10  104  114  99  102   none
11  110  130  105 115   signal  BUY
12  112  122  110 115   none
13  118  145  112 123   none
14  123  143  71  133   signal  NONE
15  130  150  120 140   none

In the sample data above, in Line no 9, df['Trade']=="BUY" happens since the minimum value of df['Low']=95 between this df['signal']="signal" and previous df['signal']="signal" IS GREATER THAN the minimum value of df['Low']= 65 between the previous two occurences of df['signal']="signal".

Similarly, in Line no 14, df['Trade']="None" happened because the minimum value of df['Low']=71 between this signal and previous signal is NOT GREATER THAN the minimum value of df['Low']=99 between the previous two signals.

I need help with the code to implement this.


    import pandas as pd
    import numpy as np
    import bisect as bs

    df = pd.read_csv("Nifty.csv")
    cols = ['No', 'Low', 'signal']
    df['5EMA'] = df['Close'].ewm(span=5).mean()
    df['10EMA'] = df['Close'].ewm(span=10).mean()
    condition1 = df['5EMA'].shift(1) < df['10EMA'].shift(1)
    condition2 = df['5EMA'] > df['10EMA']
    df['signal'] = np.where(condition1 & condition2, 'signal', None)
    df1 = pd.concat([df[cols], df.loc[df.signal=='signal',cols].assign(signal='temp')]) \
            .sort_values(['No', 'signal'],ascending=[1,0])
    df1['g'] = (df1.signal == 'signal').cumsum()
    df1['Low_min'] = df1.groupby('g').Low.transform('min')
    s = df1.groupby('g').Low.min()
    buy = s[s.shift(1) > s.shift(3)].index.tolist()
    m1 = df1.signal.eq('signal') & df1.g.gt(3)
    m2 = df1.g.isin(buy) & m1
    df1['trade'] = np.select([m2, m1], ['Buy', 'None'], '')
    df['trade'] = ''
    df.trade.update(df1.loc[df1.signal=='signal',"trade"])
    print(df)

jxc · Accepted Answer

Your problem can be simplified after some extra temporary rows are added. I set up a new dataframe which contains only required fields from the original df, and cloned all rows labelled as 'signal' but renamed to 'temp' df.loc[df.signal=='signal',cols].assign(signal='temp'). The sorted rows will then be grouped-labeled by using the "signal" and cumsum(). see below code:

str="""No Open High Low Close signal 
1   75   95   65  50    signal 
2   78   94   74  77    none 
3   83   91   81  84    none 
4   91   101  88  93    signal 
5   104  121  95  103   none 
6   101  111  99  105   none 
7   97   108  95  101   signal 
8   103  113  102 106   none 
9   108  128  105 114   signal 
10  104  114  99  102   none 
11  110  130  105 115   signal 
12  112  122  110 115   none 
13  118  145  112 123   none 
14  123  143  71  133   signal 
15  130  150  120 140   none"""

df = pd.read_csv(pd.io.common.StringIO(str), sep='\s+')

# cols which are used in this task 
cols = ['No', 'Low', 'signal']

# create a new dataframe, cloned all 'signal' rows but rename signal to 'temp', sort the rows
df1 = pd.concat([df[cols], df.loc[df.signal=='signal',cols].assign(signal='temp')]) \
        .sort_values(['No', 'signal'],ascending=[1,0])

# set up group-number with cumsum() and get min() value from each group
df1['g'] = (df1.signal == 'signal').cumsum()
# the following field just for reference, no need for calculation
df1['Low_min'] = df1.groupby('g').Low.transform('min')

The new dataframe df1 will looks like the following. Except the first and last group, every group now starts with a 'signal' and ends with a 'temp' (which is also 'signal'):

Based on your description, for Line no 9 (yellow backgroud, the first item in df1.g==4), we can check df1.loc[df1.g==3, "Low_min"] (red bordered) against df1.loc[df1.g==1, "Low_min"] (green bordered)

if we have the following:

s = df1.groupby('g').Low.min()

the list of buy group should satisfy s.shift(1) > s.shift(3)

buy = s[s.shift(1) > s.shift(3)].index.tolist()

So, let's set up conditions:

# m1: row marked with signal
# skip the first 3 groups which do not have enough signals
m1 = df1.signal.eq('signal') & df1.g.gt(3)

# m2: m1 plus must in buy list
m2 = df1.g.isin(buy) & m1
df1['trade'] = np.select([m2, m1], ['Buy', 'None'], '')
#In [36]: df1
#Out[36]: 
#    No  Low  signal  g  Low_min trade
#0    1   65    temp  0       65      
#0    1   65  signal  1       65      
#1    2   74    none  1       65      
#2    3   81    none  1       65      
#3    4   88    temp  1       65      
#3    4   88  signal  2       88      
#4    5   95    none  2       88      
#5    6   99    none  2       88      
#6    7   95    temp  2       88      
#6    7   95  signal  3       95      
#7    8  102    none  3       95      
#8    9  105    temp  3       95      
#8    9  105  signal  4       99   Buy
#9   10   99    none  4       99      
#10  11  105    temp  4       99      
#10  11  105  signal  5       71   Buy
#11  12  110    none  5       71      
#12  13  112    none  5       71      
#13  14   71    temp  5       71      
#13  14   71  signal  6       71  None
#14  15  120    none  6       71

After we have df1.trade, we can update the original dataframe:

# set up column `trade` with EMPTY as default and update 
# the field based on df1.trade (using the index)
df['trade'] = ''
df.trade.update(df1.loc[df1.signal=='signal',"trade"])

find the minimum value of a column between two occurrences of a value in another column in a Python data frame

Answers (1)

Related Questions