Reputation: 115
Consider the following DataFrame returned by the following dictionary(code given below):
I want to create a new column in this DataFrame which tells us since how many days the values continuously True. (for each ticker i.e. groupby(ticker)).
For e.g. the values that should be there in the new column are written as comments in the below code (for the first few rows). Please comment if you've any doubts regarding the output that's needed:
{'DaysWithGain': {(Timestamp('2019-10-01 04:00:00+0000', tz='UTC'),
'AAPL'): True,
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'SHOP'): True, #1
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 10:00:00+0000', tz='UTC'), 'ZM'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'SHOP'): True, #1
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 16:00:00+0000', tz='UTC'), 'ZM'): True, #1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'AAPL'): True, #1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'SHOP'): True, # 1
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-01 22:00:00+0000', tz='UTC'), 'ZM'): True,#1
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'AAPL'): True,#2
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'SHOP'): True,#2
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-02 04:00:00+0000', tz='UTC'), 'ZM'): True, #2
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'AAPL'): True, #2
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'FSLY'): False, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'SHOP'): True, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'UPLD'): False, #0
(Timestamp('2019-10-02 10:00:00+0000', tz='UTC'), 'ZM'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'AAPL'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'LVGO'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'SHOP'): True,#2
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'UPLD'): False,#0
(Timestamp('2019-10-02 16:00:00+0000', tz='UTC'), 'ZM'): True, # 2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'AAPL'): True, #2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'FSLY'): False,#0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'LVGO'): False, #0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'SHOP'): True, #2
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'UPLD'): False,#0
(Timestamp('2019-10-02 22:00:00+0000', tz='UTC'), 'ZM'): True,#2
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 04:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 10:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 16:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'AAPL'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-03 22:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 04:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 10:00:00+0000', tz='UTC'), 'ZM'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 16:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'UPLD'): False,
(Timestamp('2019-10-04 22:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'LVGO'): False,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'SHOP'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'UPLD'): True,
(Timestamp('2019-10-07 04:00:00+0000', tz='UTC'), 'ZM'): False,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'AAPL'): True,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'FSLY'): False,
(Timestamp('2019-10-07 10:00:00+0000', tz='UTC'), 'LVGO'): False}}
Upvotes: 1
Views: 77
Reputation: 8219
This requires a custom aggregation function. The following function takes a list-like series
of True/False values and returns the number of True's at the end:
from functools import reduce
def num_last_true_vals(series):
return reduce(lambda c,v: (c+1)*v, series, 0)
let's check. The following call
(
num_last_true_vals([True,True,True,True]),
num_last_true_vals([True,False,True,True]),
num_last_true_vals([True,False,False,True]),
num_last_true_vals([True,False,True,False]),
)
returns
(4, 2, 1, 0)
as expected
Now on to your dataset. It is not very interesting in the sense that for each ticker all the DaysWithGain
values are either all True or all False. So I modify it a bit to make sure the solution works as promised. With dd
being the dictionary you provided, we do
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(dd)
df.loc[(Timestamp('2019-10-03 10:00:00+00:00'),'AAPL'), 'DaysWithGain'] = False
df = df.sort_index(level=0)
note that we set one of the entries for 'AAPL' to False. Also note we sort by timestamp for good measure
Now on to the main act, using our custom num_last_true_vals
function:
df.groupby(level=1).agg({'DaysWithGain':num_last_true_vals})
produces
DaysWithGain
AAPL 8
FSLY 0
LVGO 0
SHOP 16
UPLD 0
ZM 16
where the column DaysWithGain
returns the number of timestamps at the end that were True
You can use the same function on the rolling basis
so the following should work. Note the last couple of bits are for pretty-printing, you may decide to skip them
df2 = (df.groupby(level=1)
.apply(lambda d: d.assign(ts_since_true = d['DaysWithGain']
.rolling(window=1000, min_periods=1)
.apply(num_last_true_vals)
))
.reset_index()
.sort_values(['level_1','level_0'])
)
this produces (with your recently-edited dictionary)
level_0 level_1 DaysWithGain ts_since_true
-- ------------------------- --------- -------------- ---------------
0 2019-10-01 04:00:00+00:00 AAPL True 1
1 2019-10-01 10:00:00+00:00 AAPL True 2
7 2019-10-01 16:00:00+00:00 AAPL True 3
13 2019-10-01 22:00:00+00:00 AAPL True 4
19 2019-10-02 04:00:00+00:00 AAPL True 5
25 2019-10-02 10:00:00+00:00 AAPL True 6
31 2019-10-02 16:00:00+00:00 AAPL True 7
37 2019-10-02 22:00:00+00:00 AAPL True 8
43 2019-10-03 04:00:00+00:00 AAPL False 0
49 2019-10-03 10:00:00+00:00 AAPL False 0
55 2019-10-03 16:00:00+00:00 AAPL False 0
61 2019-10-03 22:00:00+00:00 AAPL False 0
67 2019-10-04 04:00:00+00:00 AAPL True 1
73 2019-10-04 10:00:00+00:00 AAPL True 2
79 2019-10-04 16:00:00+00:00 AAPL True 3
85 2019-10-04 22:00:00+00:00 AAPL True 4
91 2019-10-07 04:00:00+00:00 AAPL True 5
97 2019-10-07 10:00:00+00:00 AAPL True 6
2 2019-10-01 10:00:00+00:00 FSLY False 0
8 2019-10-01 16:00:00+00:00 FSLY False 0
14 2019-10-01 22:00:00+00:00 FSLY False 0
20 2019-10-02 04:00:00+00:00 FSLY False 0
26 2019-10-02 10:00:00+00:00 FSLY False 0
32 2019-10-02 16:00:00+00:00 FSLY False 0
38 2019-10-02 22:00:00+00:00 FSLY False 0
44 2019-10-03 04:00:00+00:00 FSLY False 0
50 2019-10-03 10:00:00+00:00 FSLY False 0
56 2019-10-03 16:00:00+00:00 FSLY False 0
62 2019-10-03 22:00:00+00:00 FSLY False 0
68 2019-10-04 04:00:00+00:00 FSLY False 0
74 2019-10-04 10:00:00+00:00 FSLY False 0
80 2019-10-04 16:00:00+00:00 FSLY False 0
86 2019-10-04 22:00:00+00:00 FSLY False 0
92 2019-10-07 04:00:00+00:00 FSLY False 0
98 2019-10-07 10:00:00+00:00 FSLY False 0
3 2019-10-01 10:00:00+00:00 LVGO False 0
9 2019-10-01 16:00:00+00:00 LVGO False 0
15 2019-10-01 22:00:00+00:00 LVGO False 0
21 2019-10-02 04:00:00+00:00 LVGO False 0
27 2019-10-02 10:00:00+00:00 LVGO False 0
33 2019-10-02 16:00:00+00:00 LVGO False 0
39 2019-10-02 22:00:00+00:00 LVGO False 0
45 2019-10-03 04:00:00+00:00 LVGO False 0
51 2019-10-03 10:00:00+00:00 LVGO False 0
57 2019-10-03 16:00:00+00:00 LVGO False 0
63 2019-10-03 22:00:00+00:00 LVGO False 0
69 2019-10-04 04:00:00+00:00 LVGO False 0
75 2019-10-04 10:00:00+00:00 LVGO False 0
81 2019-10-04 16:00:00+00:00 LVGO False 0
87 2019-10-04 22:00:00+00:00 LVGO False 0
93 2019-10-07 04:00:00+00:00 LVGO False 0
99 2019-10-07 10:00:00+00:00 LVGO False 0
4 2019-10-01 10:00:00+00:00 SHOP True 1
10 2019-10-01 16:00:00+00:00 SHOP True 2
16 2019-10-01 22:00:00+00:00 SHOP True 3
22 2019-10-02 04:00:00+00:00 SHOP True 4
28 2019-10-02 10:00:00+00:00 SHOP True 5
34 2019-10-02 16:00:00+00:00 SHOP True 6
40 2019-10-02 22:00:00+00:00 SHOP True 7
46 2019-10-03 04:00:00+00:00 SHOP True 8
52 2019-10-03 10:00:00+00:00 SHOP True 9
58 2019-10-03 16:00:00+00:00 SHOP True 10
64 2019-10-03 22:00:00+00:00 SHOP True 11
70 2019-10-04 04:00:00+00:00 SHOP True 12
76 2019-10-04 10:00:00+00:00 SHOP True 13
82 2019-10-04 16:00:00+00:00 SHOP True 14
88 2019-10-04 22:00:00+00:00 SHOP True 15
94 2019-10-07 04:00:00+00:00 SHOP True 16
5 2019-10-01 10:00:00+00:00 UPLD False 0
11 2019-10-01 16:00:00+00:00 UPLD False 0
17 2019-10-01 22:00:00+00:00 UPLD False 0
23 2019-10-02 04:00:00+00:00 UPLD False 0
29 2019-10-02 10:00:00+00:00 UPLD False 0
35 2019-10-02 16:00:00+00:00 UPLD False 0
41 2019-10-02 22:00:00+00:00 UPLD False 0
47 2019-10-03 04:00:00+00:00 UPLD False 0
53 2019-10-03 10:00:00+00:00 UPLD False 0
59 2019-10-03 16:00:00+00:00 UPLD False 0
65 2019-10-03 22:00:00+00:00 UPLD False 0
71 2019-10-04 04:00:00+00:00 UPLD False 0
77 2019-10-04 10:00:00+00:00 UPLD False 0
83 2019-10-04 16:00:00+00:00 UPLD False 0
89 2019-10-04 22:00:00+00:00 UPLD False 0
95 2019-10-07 04:00:00+00:00 UPLD True 1
6 2019-10-01 10:00:00+00:00 ZM True 1
12 2019-10-01 16:00:00+00:00 ZM True 2
18 2019-10-01 22:00:00+00:00 ZM True 3
24 2019-10-02 04:00:00+00:00 ZM True 4
30 2019-10-02 10:00:00+00:00 ZM True 5
36 2019-10-02 16:00:00+00:00 ZM True 6
42 2019-10-02 22:00:00+00:00 ZM True 7
48 2019-10-03 04:00:00+00:00 ZM True 8
54 2019-10-03 10:00:00+00:00 ZM True 9
60 2019-10-03 16:00:00+00:00 ZM True 10
66 2019-10-03 22:00:00+00:00 ZM True 11
72 2019-10-04 04:00:00+00:00 ZM True 12
78 2019-10-04 10:00:00+00:00 ZM True 13
84 2019-10-04 16:00:00+00:00 ZM False 0
90 2019-10-04 22:00:00+00:00 ZM False 0
96 2019-10-07 04:00:00+00:00 ZM False 0
Upvotes: 1