Umang Garg
Umang Garg

Reputation: 115

How to find since how many days the value of a column is True?

Consider the following DataFrame returned by the following dictionary(code given below):

I want to create a new column in this DataFrame which tells us since how many days the values continuously True. (for each ticker i.e. groupby(ticker)).

For e.g. the values that should be there in the new column are written as comments in the below code (for the first few rows). Please comment if you've any doubts regarding the output that's needed:

{'DaysWithGain': {(Timestamp('2019-10-01 04:00:00+0000', tz='UTC'),
      'AAPL'): True,
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'SHOP'): True, #1
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'ZM'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'SHOP'): True, #1
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'ZM'): True, #1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'AAPL'): True, #1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'SHOP'): True, # 1
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'ZM'): True,#1
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'AAPL'): True,#2
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'SHOP'): True,#2
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'ZM'): True, #2
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'AAPL'): True, #2
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'SHOP'): True, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
     (Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'ZM'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'AAPL'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'LVGO'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'SHOP'): True,#2
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'UPLD'): False,#0
     (Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'ZM'): True, # 2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'AAPL'): True, #2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'FSLY'): False,#0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'SHOP'): True, #2
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'UPLD'): False,#0
     (Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'ZM'): True,#2
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'AAPL'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'ZM'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'UPLD'): False,
     (Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'LVGO'): False,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'SHOP'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'UPLD'): True,
     (Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'ZM'): False,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'AAPL'): True,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'FSLY'): False,
     (Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'LVGO'): False}}

Upvotes: 1

Views: 77

Answers (1)

piterbarg
piterbarg

Reputation: 8219

This requires a custom aggregation function. The following function takes a list-like series of True/False values and returns the number of True's at the end:

from functools import reduce
def num_last_true_vals(series):
    return reduce(lambda c,v: (c+1)*v, series, 0)

let's check. The following call

(
num_last_true_vals([True,True,True,True]), 
num_last_true_vals([True,False,True,True]),
num_last_true_vals([True,False,False,True]),
num_last_true_vals([True,False,True,False]),
)

returns

(4, 2, 1, 0)

as expected

Now on to your dataset. It is not very interesting in the sense that for each ticker all the DaysWithGain values are either all True or all False. So I modify it a bit to make sure the solution works as promised. With dd being the dictionary you provided, we do

import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(dd)
df.loc[(Timestamp('2019-10-03 10:00:00+00:00'),'AAPL'), 'DaysWithGain'] = False
df = df.sort_index(level=0)

note that we set one of the entries for 'AAPL' to False. Also note we sort by timestamp for good measure

Now on to the main act, using our custom num_last_true_vals function:

df.groupby(level=1).agg({'DaysWithGain':num_last_true_vals})

produces


        DaysWithGain
AAPL    8
FSLY    0
LVGO    0
SHOP    16
UPLD    0
ZM      16

where the column DaysWithGain returns the number of timestamps at the end that were True

Edit after comments by OP

You can use the same function on the rolling basis

so the following should work. Note the last couple of bits are for pretty-printing, you may decide to skip them

df2 = (df.groupby(level=1)
        .apply(lambda d: d.assign(ts_since_true = d['DaysWithGain']
            .rolling(window=1000, min_periods=1)
            .apply(num_last_true_vals)
        ))
        .reset_index()
        .sort_values(['level_1','level_0'])
    )

this produces (with your recently-edited dictionary)

    level_0                    level_1    DaysWithGain      ts_since_true
--  -------------------------  ---------  --------------  ---------------
 0  2019-10-01 04:00:00+00:00  AAPL       True                          1
 1  2019-10-01 10:00:00+00:00  AAPL       True                          2
 7  2019-10-01 16:00:00+00:00  AAPL       True                          3
13  2019-10-01 22:00:00+00:00  AAPL       True                          4
19  2019-10-02 04:00:00+00:00  AAPL       True                          5
25  2019-10-02 10:00:00+00:00  AAPL       True                          6
31  2019-10-02 16:00:00+00:00  AAPL       True                          7
37  2019-10-02 22:00:00+00:00  AAPL       True                          8
43  2019-10-03 04:00:00+00:00  AAPL       False                         0
49  2019-10-03 10:00:00+00:00  AAPL       False                         0
55  2019-10-03 16:00:00+00:00  AAPL       False                         0
61  2019-10-03 22:00:00+00:00  AAPL       False                         0
67  2019-10-04 04:00:00+00:00  AAPL       True                          1
73  2019-10-04 10:00:00+00:00  AAPL       True                          2
79  2019-10-04 16:00:00+00:00  AAPL       True                          3
85  2019-10-04 22:00:00+00:00  AAPL       True                          4
91  2019-10-07 04:00:00+00:00  AAPL       True                          5
97  2019-10-07 10:00:00+00:00  AAPL       True                          6
 2  2019-10-01 10:00:00+00:00  FSLY       False                         0
 8  2019-10-01 16:00:00+00:00  FSLY       False                         0
14  2019-10-01 22:00:00+00:00  FSLY       False                         0
20  2019-10-02 04:00:00+00:00  FSLY       False                         0
26  2019-10-02 10:00:00+00:00  FSLY       False                         0
32  2019-10-02 16:00:00+00:00  FSLY       False                         0
38  2019-10-02 22:00:00+00:00  FSLY       False                         0
44  2019-10-03 04:00:00+00:00  FSLY       False                         0
50  2019-10-03 10:00:00+00:00  FSLY       False                         0
56  2019-10-03 16:00:00+00:00  FSLY       False                         0
62  2019-10-03 22:00:00+00:00  FSLY       False                         0
68  2019-10-04 04:00:00+00:00  FSLY       False                         0
74  2019-10-04 10:00:00+00:00  FSLY       False                         0
80  2019-10-04 16:00:00+00:00  FSLY       False                         0
86  2019-10-04 22:00:00+00:00  FSLY       False                         0
92  2019-10-07 04:00:00+00:00  FSLY       False                         0
98  2019-10-07 10:00:00+00:00  FSLY       False                         0
 3  2019-10-01 10:00:00+00:00  LVGO       False                         0
 9  2019-10-01 16:00:00+00:00  LVGO       False                         0
15  2019-10-01 22:00:00+00:00  LVGO       False                         0
21  2019-10-02 04:00:00+00:00  LVGO       False                         0
27  2019-10-02 10:00:00+00:00  LVGO       False                         0
33  2019-10-02 16:00:00+00:00  LVGO       False                         0
39  2019-10-02 22:00:00+00:00  LVGO       False                         0
45  2019-10-03 04:00:00+00:00  LVGO       False                         0
51  2019-10-03 10:00:00+00:00  LVGO       False                         0
57  2019-10-03 16:00:00+00:00  LVGO       False                         0
63  2019-10-03 22:00:00+00:00  LVGO       False                         0
69  2019-10-04 04:00:00+00:00  LVGO       False                         0
75  2019-10-04 10:00:00+00:00  LVGO       False                         0
81  2019-10-04 16:00:00+00:00  LVGO       False                         0
87  2019-10-04 22:00:00+00:00  LVGO       False                         0
93  2019-10-07 04:00:00+00:00  LVGO       False                         0
99  2019-10-07 10:00:00+00:00  LVGO       False                         0
 4  2019-10-01 10:00:00+00:00  SHOP       True                          1
10  2019-10-01 16:00:00+00:00  SHOP       True                          2
16  2019-10-01 22:00:00+00:00  SHOP       True                          3
22  2019-10-02 04:00:00+00:00  SHOP       True                          4
28  2019-10-02 10:00:00+00:00  SHOP       True                          5
34  2019-10-02 16:00:00+00:00  SHOP       True                          6
40  2019-10-02 22:00:00+00:00  SHOP       True                          7
46  2019-10-03 04:00:00+00:00  SHOP       True                          8
52  2019-10-03 10:00:00+00:00  SHOP       True                          9
58  2019-10-03 16:00:00+00:00  SHOP       True                         10
64  2019-10-03 22:00:00+00:00  SHOP       True                         11
70  2019-10-04 04:00:00+00:00  SHOP       True                         12
76  2019-10-04 10:00:00+00:00  SHOP       True                         13
82  2019-10-04 16:00:00+00:00  SHOP       True                         14
88  2019-10-04 22:00:00+00:00  SHOP       True                         15
94  2019-10-07 04:00:00+00:00  SHOP       True                         16
 5  2019-10-01 10:00:00+00:00  UPLD       False                         0
11  2019-10-01 16:00:00+00:00  UPLD       False                         0
17  2019-10-01 22:00:00+00:00  UPLD       False                         0
23  2019-10-02 04:00:00+00:00  UPLD       False                         0
29  2019-10-02 10:00:00+00:00  UPLD       False                         0
35  2019-10-02 16:00:00+00:00  UPLD       False                         0
41  2019-10-02 22:00:00+00:00  UPLD       False                         0
47  2019-10-03 04:00:00+00:00  UPLD       False                         0
53  2019-10-03 10:00:00+00:00  UPLD       False                         0
59  2019-10-03 16:00:00+00:00  UPLD       False                         0
65  2019-10-03 22:00:00+00:00  UPLD       False                         0
71  2019-10-04 04:00:00+00:00  UPLD       False                         0
77  2019-10-04 10:00:00+00:00  UPLD       False                         0
83  2019-10-04 16:00:00+00:00  UPLD       False                         0
89  2019-10-04 22:00:00+00:00  UPLD       False                         0
95  2019-10-07 04:00:00+00:00  UPLD       True                          1
 6  2019-10-01 10:00:00+00:00  ZM         True                          1
12  2019-10-01 16:00:00+00:00  ZM         True                          2
18  2019-10-01 22:00:00+00:00  ZM         True                          3
24  2019-10-02 04:00:00+00:00  ZM         True                          4
30  2019-10-02 10:00:00+00:00  ZM         True                          5
36  2019-10-02 16:00:00+00:00  ZM         True                          6
42  2019-10-02 22:00:00+00:00  ZM         True                          7
48  2019-10-03 04:00:00+00:00  ZM         True                          8
54  2019-10-03 10:00:00+00:00  ZM         True                          9
60  2019-10-03 16:00:00+00:00  ZM         True                         10
66  2019-10-03 22:00:00+00:00  ZM         True                         11
72  2019-10-04 04:00:00+00:00  ZM         True                         12
78  2019-10-04 10:00:00+00:00  ZM         True                         13
84  2019-10-04 16:00:00+00:00  ZM         False                         0
90  2019-10-04 22:00:00+00:00  ZM         False                         0
96  2019-10-07 04:00:00+00:00  ZM         False                         0

Upvotes: 1

Related Questions