S. A.
S. A.

Reputation: 111

Count the number of rows above since the condition was true

Assume, I have a df with some integers:

mdf = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('A'))

And condition for them:

mdf['CM'] = mdf['A'] > mdf['A'].shift(5)

What it the most efficient way to count the number of rows above current since the condition was true?

It should be like so:

def some_function(df):
  """
  Here is calculated nearest row index from current where the condition was met
  """
  return integer_value of None

mdf['SINCE'] = some_function(mdf)

UPD. The df should look so:

Idx  A    CM    SINCE   Formula
0   23  False   None    
1   55  False   None    
2   48  False   None    
3   17  False   None    
4   24   True   0       4-4
5   30  False   1       5-4
6   99   True   0       6-6
7   11  False   1       7-6
8   47   True   0       8-8
9   25  False   1       9-8
10  78  False   2       10-8
11  40   True   0       11-11
12  99   True   0       12-12
13  7   False   1       13-12
14  6   False   2       14-12
15  64   True   0       15-15
16  62  False   1       16-15
17  39   True   0       17-17
18  41   True   0       18-18
19  28  False   1       19-18

Upvotes: 1

Views: 478

Answers (2)

jezrael
jezrael

Reputation: 862611

Use:

s = mdf['CM'].cumsum()
mdf["SINCE"] = mdf[s.gt(0)].groupby(s).cumcount()

print (mdf)
      A     CM  SINCE Formula
Idx                          
0    23  False    NaN     NaN
1    55  False    NaN     NaN
2    48  False    NaN     NaN
3    17  False    NaN     NaN
4    24   True    0.0     4-4
5    30  False    1.0     5-4
6    99   True    0.0     6-6
7    11  False    1.0     7-6
8    47   True    0.0     8-8
9    25  False    1.0     9-8
10   78  False    2.0    10-8
11   40   True    0.0   11-11
12   99   True    0.0   12-12
13    7  False    1.0   13-12
14    6  False    2.0   14-12
15   64   True    0.0   15-15
16   62  False    1.0   16-15
17   39   True    0.0   17-17
18   41   True    0.0   18-18
19   28  False    1.0   19-18

Upvotes: 1

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You can try with cumulative sum:

cs = mdf.CM.cumsum()
mdf["SINCE"] = mdf.groupby(cs).cumcount()

# Nonify the values up until first True
mdf.loc[cs == 0, "SINCE"] = np.nan

to get (for 30 sample data):

     A     CM  SINCE
0   79  False    NaN
1   46  False    NaN
2   59  False    NaN
3   31  False    NaN
4   72  False    NaN
5   34  False    NaN
6   42  False    NaN
7   40  False    NaN
8   82   True    0.0
9   16  False    1.0
10   9  False    2.0
11  45   True    0.0
12  68   True    0.0
13  48  False    1.0
14  20   True    0.0
15   9  False    1.0
16  14  False    2.0
17  91   True    0.0
18  92   True    0.0
19   0  False    1.0
20  93   True    0.0
21   1  False    1.0
22  60  False    2.0
23   0  False    3.0
24  93   True    0.0
25  85  False    1.0
26  52   True    0.0
27  32  False    1.0
28  65   True    0.0
29  85  False    1.0

Upvotes: 1

Related Questions