pythOnometrist
pythOnometrist

Reputation: 6809

Pandas : Event Flagging in time series

Hi I am trying to generate a flag on time series data to identify the observations falling within +-3 time periods of the event. Event dates can vary for each place. Not sure how to approach in pandas. Any suggestions would be most welcome.

enter image description here

Upvotes: 0

Views: 495

Answers (2)

Scott Boston
Scott Boston

Reputation: 153470

Let's try:

import pandas as pd
import numpy as np

d = pd.date_range('2019-01-01', '2019-01-10', freq='D')

df = pd.DataFrame({'ID':np.arange(1,21)
                  ,'Place':['A']*10+['B']*10
                  ,'Date':d.to_list() * 2
                  ,'event':[0]*5+[1]+[0]*7+[1]+[0]*6
                  ,'Flag':[0,0,-3,-2,-1,1,2,3,4,0,-3,-2,-1,
                          1,2,3,4,0,0,0]},
                 index = np.arange(1,21))

n=3
s = df['event'].rolling(n*2+1, center=True, min_periods=1).max()
s = s.cumsum()

l = ((s - s.where(s.duplicated()).ffill())).fillna(0) 
l.update(l[l>n]+1)

df['Flag'] = (l - n-1).where(l.gt(0), 0)
print(df)

Output:

    ID Place       Date  event  Flag
1    1     A 2019-01-01      0   0.0
2    2     A 2019-01-02      0   0.0
3    3     A 2019-01-03      0  -3.0
4    4     A 2019-01-04      0  -2.0
5    5     A 2019-01-05      0  -1.0
6    6     A 2019-01-06      1   1.0
7    7     A 2019-01-07      0   2.0
8    8     A 2019-01-08      0   3.0
9    9     A 2019-01-09      0   4.0
10  10     A 2019-01-10      0   0.0
11  11     B 2019-01-01      0  -3.0
12  12     B 2019-01-02      0  -2.0
13  13     B 2019-01-03      0  -1.0
14  14     B 2019-01-04      1   1.0
15  15     B 2019-01-05      0   2.0
16  16     B 2019-01-06      0   3.0
17  17     B 2019-01-07      0   4.0
18  18     B 2019-01-08      0   0.0
19  19     B 2019-01-09      0   0.0
20  20     B 2019-01-10      0   0.0

Upvotes: 2

ALollz
ALollz

Reputation: 59549

We can use pd.merge_asof to bring the closest event 'Date' to each row within each 'Place'. I use np.select to get your Flag column correct, since it follows somewhat odd logic.

If windows overlap, you can change the ordering in conds and choices to give one priority over the other without overhauling any of the other logic.

import pandas as pd
import numpy as np
df['Date'] = pd.to_datetime(df.Date)

# Bring closest event date to `df`. Sort ruins order, but can fix later.
df = pd.merge_asof(df.sort_values('Date'), 
                   (df.loc[df.Event.eq(1), ['Place', 'Date']]
                      .sort_values('Date')
                      .rename(columns={'Date': 'Date_2'})), 
                   by='Place', 
                   direction='nearest',
                   left_on='Date',
                   right_on='Date_2')

# Find day difference, needs to be adjusted in subsequent steps
df['Flag'] = (df['Date'] - df['Date_2']).dt.days

# Fix Flag to be desired final value.
conds = [df.Flag.ge(-3) & df.Flag.lt(0), df.Flag.ge(0) & df.Flag.le(3)]
choices = [df.Flag, df.Flag+1]
df['Flag'] = np.select(conds, choices, default=0)

Output: print(df.sort_values('ID'))

    ID Place       Date  Event     Date_2  Flag
0    1     A 2019-01-01      0 2019-01-06     0
2    2     A 2019-01-02      0 2019-01-06     0
4    3     A 2019-01-03      0 2019-01-06    -3
6    4     A 2019-01-04      0 2019-01-06    -2
8    5     A 2019-01-05      0 2019-01-06    -1
10   6     A 2019-01-06      1 2019-01-06     1
12   7     A 2019-01-07      0 2019-01-06     2
15   8     A 2019-01-08      0 2019-01-06     3
17   9     A 2019-01-09      0 2019-01-06     4
18  10     A 2019-01-10      0 2019-01-06     0
1   11     B 2019-01-01      0 2019-01-04    -3
3   12     B 2019-01-02      0 2019-01-04    -2
5   13     B 2019-01-03      0 2019-01-04    -1
7   14     B 2019-01-04      1 2019-01-04     1
9   15     B 2019-01-05      0 2019-01-04     2
11  16     B 2019-01-06      0 2019-01-04     3
13  17     B 2019-01-07      0 2019-01-04     4
14  18     B 2019-01-08      0 2019-01-04     0
16  19     B 2019-01-09      0 2019-01-04     0
19  20     B 2019-01-10      0 2019-01-04     0

Upvotes: 1

Related Questions