user1234440
user1234440

Reputation: 23567

Pandas Conditional True

Let say I have 2 data frames with 1,0 (True or False). Let the first one be a and the second one be b. Is there a way to avoid looping such that whenever the a is true and b is true any time within the last n observations return true? So for example, lets assume n=2, in the example below, since a on 2019-10-11 is true, we will look at b column and if its also true within the last n observation, column a on 2019-10-11 is valid or set to true. else it will be zero.

            a  b
2019-10-08  0  0
2019-10-09  0  0
2019-10-10  0  1
2019-10-11  1  0
2019-10-14  0  0
2019-10-15  0  0
2019-10-16  0  0

My attempt below, too slow...

def compute_stats(z,n,df):
    #print()
    end_idx = z.iloc[0].Index

    if (df.iloc[(end_idx-n):end_idx,1] * 1).sum() > 0:
        return 1
    else:
        return 0

x = data1.cumsum()
x.name = "Signal"

df = pd.concat([data1,data2,x],axis=1)
df['Index'] = list(range(0,len(data1)))
tmp = df.groupby("Signal").apply(lambda z: compute_stats(z,n,df))

In my attempt, I essentially create a separate ID column grouped by each signal. From there I did a group by. Within the function I called inside groupby, I just looked back to see if there are any True values in column b.

Thanks

Upvotes: 2

Views: 72

Answers (1)

BENY
BENY

Reputation: 323306

We can do with ffill with limit

df.a.eq(1)&df.b.mask(df.b==0).ffill(limit=2).eq(1)
Out[205]: 
2019-10-08    False
2019-10-09    False
2019-10-10    False
2019-10-11     True
2019-10-14    False
2019-10-15    False
2019-10-16    False
dtype: bool

Upvotes: 5

Related Questions