Sam Al-Ghammari
Sam Al-Ghammari

Reputation: 1021

Mark specific event that occurs between two futuristc events

I have a an event that emerges from a random date between two events. I need to mark that random event if it comes between two specific events. Let's assume the random event is Z that emerges between sequential events A-D. So if event Z emerges between event A-D, I need to mark it in another new column.

Here is a short example from original DF:

events  date
A     1/14/2020
Z     1/20/2020
B     2/15/2020
D     2/28/2020
A     3/3/2020
B     2/5/2020
C     2/6/2020
D     2/9/2020
A     2/12/2020
Z     2/13/2020
B     2/16/2020
D     2/20/2020
Z     2/21/2020
A     2/22/2020
B     2/23/2020
C     2/24/2020

Desired output:

events  date    temp
A   1/14/2020   True
Z   1/20/2020   True
B   2/15/2020   True
D   2/28/2020   True
A   3/3/2020    False
B   2/5/2020    False
C   2/6/2020    False
D   2/9/2020    False
A   2/12/2020   True
Z   2/13/2020   True
B   2/16/2020   True
D   2/20/2020   True
Z   2/21/2020   False
A   2/22/2020   False
B   2/23/2020   False
C   2/24/2020   False

So technically if event Z comes out of the sequence A-D events, it should be marked False. However, if it is within the event sequence A-D or A-C, it should be marked True.

Upvotes: 1

Views: 70

Answers (1)

Rick M
Rick M

Reputation: 1012

Based on your input data, you can do use groupby to find the events. Just make sure they're in the correct order.

First, make a couple of markers to determine the distinct sets of events. The bfill will allow you to identify and ignore the "Z" events that occur after a final non-"Z" event:

df['grp'] = (df['events']=='A').cumsum()
df['grp2'] = df['grp']
df.loc[df['events']=='Z', 'grp2'] = np.nan
df['grp2'] = df['grp2'].fillna(method='bfill')

...and a small function to check if a "Z" event occurs in the middle of a group, and whether or not the group ends with a "C" or "D" event. If a grouping like A Z B or A Z E is valid (i.e., not ending on "C" or "D" specifically, but rather just trying to throw out all "Z" events that occur just prior to an "A") you can use the commented line instead.

Also, you could use a lambda, but I think this is clearer:

def checker(x):
    #return np.any(x['events']=='Z') & (x['grp']==x['grp2'])
    return np.any(x['events']=='Z') & (x['grp']==x['grp2']) & (x['events'].isin(['C','D']).iloc[-1])

This groupby yields the temp column for each set of events:

check = df.groupby(['grp', 'grp2']).apply(checker).rename('temp').reset_index()

Finally, you can pd.concat that result back against the original dataframe:

df = pd.concat([df[['events','date']], check[['temp']]], axis=1)

print(df)
   events       date   temp
0       A 2020-01-14   True
1       Z 2020-01-20   True
2       B 2020-02-15   True
3       D 2020-02-28   True
4       A 2020-03-03  False
5       B 2020-02-05  False
6       C 2020-02-06  False
7       D 2020-02-09  False
8       A 2020-02-12   True
9       Z 2020-02-13   True
10      B 2020-02-16   True
11      D 2020-02-20   True
12      Z 2020-02-21  False
13      A 2020-02-22  False
14      B 2020-02-23  False
15      D 2020-02-24  False

Upvotes: 1

Related Questions