Reputation: 23567
Let say I have 2 data frames with 1,0 (True or False). Let the first one be a
and the second one be b
. Is there a way to avoid looping such that whenever the a
is true and b
is true any time within the last n observations return true? So for example, lets assume n=2
, in the example below, since a
on 2019-10-11
is true, we will look at b
column and if its also true within the last n
observation, column a
on 2019-10-11
is valid or set to true. else it will be zero.
a b
2019-10-08 0 0
2019-10-09 0 0
2019-10-10 0 1
2019-10-11 1 0
2019-10-14 0 0
2019-10-15 0 0
2019-10-16 0 0
My attempt below, too slow...
def compute_stats(z,n,df):
#print()
end_idx = z.iloc[0].Index
if (df.iloc[(end_idx-n):end_idx,1] * 1).sum() > 0:
return 1
else:
return 0
x = data1.cumsum()
x.name = "Signal"
df = pd.concat([data1,data2,x],axis=1)
df['Index'] = list(range(0,len(data1)))
tmp = df.groupby("Signal").apply(lambda z: compute_stats(z,n,df))
In my attempt, I essentially create a separate ID column grouped by each signal. From there I did a group by. Within the function I called inside groupby, I just looked back to see if there are any True values in column b
.
Thanks
Upvotes: 2
Views: 72
Reputation: 323306
We can do with ffill
with limit
df.a.eq(1)&df.b.mask(df.b==0).ffill(limit=2).eq(1)
Out[205]:
2019-10-08 False
2019-10-09 False
2019-10-10 False
2019-10-11 True
2019-10-14 False
2019-10-15 False
2019-10-16 False
dtype: bool
Upvotes: 5