Reputation: 1021
I have a an event that emerges from a random date between two events. I need to mark that random event if it comes between two specific events. Let's assume the random event is Z that emerges between sequential events A-D. So if event Z emerges between event A-D, I need to mark it in another new column.
Here is a short example from original DF:
events date
A 1/14/2020
Z 1/20/2020
B 2/15/2020
D 2/28/2020
A 3/3/2020
B 2/5/2020
C 2/6/2020
D 2/9/2020
A 2/12/2020
Z 2/13/2020
B 2/16/2020
D 2/20/2020
Z 2/21/2020
A 2/22/2020
B 2/23/2020
C 2/24/2020
Desired output:
events date temp
A 1/14/2020 True
Z 1/20/2020 True
B 2/15/2020 True
D 2/28/2020 True
A 3/3/2020 False
B 2/5/2020 False
C 2/6/2020 False
D 2/9/2020 False
A 2/12/2020 True
Z 2/13/2020 True
B 2/16/2020 True
D 2/20/2020 True
Z 2/21/2020 False
A 2/22/2020 False
B 2/23/2020 False
C 2/24/2020 False
So technically if event Z comes out of the sequence A-D events, it should be marked False. However, if it is within the event sequence A-D or A-C, it should be marked True.
Upvotes: 1
Views: 70
Reputation: 1012
Based on your input data, you can do use groupby
to find the events. Just make sure they're in the correct order.
First, make a couple of markers to determine the distinct sets of events. The bfill
will allow you to identify and ignore the "Z" events that occur after a final non-"Z" event:
df['grp'] = (df['events']=='A').cumsum()
df['grp2'] = df['grp']
df.loc[df['events']=='Z', 'grp2'] = np.nan
df['grp2'] = df['grp2'].fillna(method='bfill')
...and a small function to check if a "Z" event occurs in the middle of a group, and whether or not the group ends with a "C" or "D" event. If a grouping like A Z B
or A Z E
is valid (i.e., not ending on "C" or "D" specifically, but rather just trying to throw out all "Z" events that occur just prior to an "A") you can use the commented line instead.
Also, you could use a lambda, but I think this is clearer:
def checker(x):
#return np.any(x['events']=='Z') & (x['grp']==x['grp2'])
return np.any(x['events']=='Z') & (x['grp']==x['grp2']) & (x['events'].isin(['C','D']).iloc[-1])
This groupby yields the temp
column for each set of events:
check = df.groupby(['grp', 'grp2']).apply(checker).rename('temp').reset_index()
Finally, you can pd.concat
that result back against the original dataframe:
df = pd.concat([df[['events','date']], check[['temp']]], axis=1)
print(df)
events date temp
0 A 2020-01-14 True
1 Z 2020-01-20 True
2 B 2020-02-15 True
3 D 2020-02-28 True
4 A 2020-03-03 False
5 B 2020-02-05 False
6 C 2020-02-06 False
7 D 2020-02-09 False
8 A 2020-02-12 True
9 Z 2020-02-13 True
10 B 2020-02-16 True
11 D 2020-02-20 True
12 Z 2020-02-21 False
13 A 2020-02-22 False
14 B 2020-02-23 False
15 D 2020-02-24 False
Upvotes: 1