Reputation: 6809
Hi I am trying to generate a flag on time series data to identify the observations falling within +-3 time periods of the event. Event dates can vary for each place. Not sure how to approach in pandas. Any suggestions would be most welcome.
Upvotes: 0
Views: 495
Reputation: 153470
Let's try:
import pandas as pd
import numpy as np
d = pd.date_range('2019-01-01', '2019-01-10', freq='D')
df = pd.DataFrame({'ID':np.arange(1,21)
,'Place':['A']*10+['B']*10
,'Date':d.to_list() * 2
,'event':[0]*5+[1]+[0]*7+[1]+[0]*6
,'Flag':[0,0,-3,-2,-1,1,2,3,4,0,-3,-2,-1,
1,2,3,4,0,0,0]},
index = np.arange(1,21))
n=3
s = df['event'].rolling(n*2+1, center=True, min_periods=1).max()
s = s.cumsum()
l = ((s - s.where(s.duplicated()).ffill())).fillna(0)
l.update(l[l>n]+1)
df['Flag'] = (l - n-1).where(l.gt(0), 0)
print(df)
Output:
ID Place Date event Flag
1 1 A 2019-01-01 0 0.0
2 2 A 2019-01-02 0 0.0
3 3 A 2019-01-03 0 -3.0
4 4 A 2019-01-04 0 -2.0
5 5 A 2019-01-05 0 -1.0
6 6 A 2019-01-06 1 1.0
7 7 A 2019-01-07 0 2.0
8 8 A 2019-01-08 0 3.0
9 9 A 2019-01-09 0 4.0
10 10 A 2019-01-10 0 0.0
11 11 B 2019-01-01 0 -3.0
12 12 B 2019-01-02 0 -2.0
13 13 B 2019-01-03 0 -1.0
14 14 B 2019-01-04 1 1.0
15 15 B 2019-01-05 0 2.0
16 16 B 2019-01-06 0 3.0
17 17 B 2019-01-07 0 4.0
18 18 B 2019-01-08 0 0.0
19 19 B 2019-01-09 0 0.0
20 20 B 2019-01-10 0 0.0
Upvotes: 2
Reputation: 59549
We can use pd.merge_asof
to bring the closest event 'Date' to each row within each 'Place'. I use np.select
to get your Flag column correct, since it follows somewhat odd logic.
If windows overlap, you can change the ordering in conds
and choices
to give one priority over the other without overhauling any of the other logic.
import pandas as pd
import numpy as np
df['Date'] = pd.to_datetime(df.Date)
# Bring closest event date to `df`. Sort ruins order, but can fix later.
df = pd.merge_asof(df.sort_values('Date'),
(df.loc[df.Event.eq(1), ['Place', 'Date']]
.sort_values('Date')
.rename(columns={'Date': 'Date_2'})),
by='Place',
direction='nearest',
left_on='Date',
right_on='Date_2')
# Find day difference, needs to be adjusted in subsequent steps
df['Flag'] = (df['Date'] - df['Date_2']).dt.days
# Fix Flag to be desired final value.
conds = [df.Flag.ge(-3) & df.Flag.lt(0), df.Flag.ge(0) & df.Flag.le(3)]
choices = [df.Flag, df.Flag+1]
df['Flag'] = np.select(conds, choices, default=0)
print(df.sort_values('ID'))
ID Place Date Event Date_2 Flag
0 1 A 2019-01-01 0 2019-01-06 0
2 2 A 2019-01-02 0 2019-01-06 0
4 3 A 2019-01-03 0 2019-01-06 -3
6 4 A 2019-01-04 0 2019-01-06 -2
8 5 A 2019-01-05 0 2019-01-06 -1
10 6 A 2019-01-06 1 2019-01-06 1
12 7 A 2019-01-07 0 2019-01-06 2
15 8 A 2019-01-08 0 2019-01-06 3
17 9 A 2019-01-09 0 2019-01-06 4
18 10 A 2019-01-10 0 2019-01-06 0
1 11 B 2019-01-01 0 2019-01-04 -3
3 12 B 2019-01-02 0 2019-01-04 -2
5 13 B 2019-01-03 0 2019-01-04 -1
7 14 B 2019-01-04 1 2019-01-04 1
9 15 B 2019-01-05 0 2019-01-04 2
11 16 B 2019-01-06 0 2019-01-04 3
13 17 B 2019-01-07 0 2019-01-04 4
14 18 B 2019-01-08 0 2019-01-04 0
16 19 B 2019-01-09 0 2019-01-04 0
19 20 B 2019-01-10 0 2019-01-04 0
Upvotes: 1