Reputation: 7723
I have the following DataFrame:
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173/04/01 12:35:00','2173/04/01 12:50:00','2173/04/02 12:59:00','2173/04/02 13:14:00','2173/04/04 13:37:00','2173/04/06 13:39:00','2173/04/06 11:30:00','2173/04/08 16:00:00','2173/04/09 22:00:00','2173/04/11 04:00:00','2173/04/13 04:30:00','2173/04/14 08:00:00'],
'val' :[0,0,0,0,1,0,0,0,0,0,0,0]
})
df['time_1'] = pd.to_datetime(df['time_1'])
df['day'] = df['time_1'].dt.day
What I would like to do is, for each subject, get two days
, based on a condition given below
val = 1
- Fetch previous two days (from val = 1)
val = 0
- Fetch middle two days of his duration
This is what I tried, but it isn't elegant or accurate.
con = lambda x: (x.eq(1))
con_1 = lambda x: (x.eq(0))
df.loc[df.groupby('subject_id')['val'].transform(con)]['time_1'] - timedelta(days = 2)
Following is my expected output:
subject = 1
has val = 1
, so we get previous two days from val = 1
for him and for subject = 2
, as there is no val = 1
, we get middle 2 days of his duration
(8th - 14th)
Upvotes: 2
Views: 172
Reputation: 863031
I believe you need:
def func(x):
floor = x['time_1'].dt.floor('d')
if (x['val'].any()):
date = x.loc[x['val'] == 1, 'time_1'].iat[0].floor('d')
new_date = date - pd.Timedelta(3, unit='d')
return (x[floor.between(new_date, date, inclusive=False)])
else:
date = floor.mean().floor('d')
new_date = date + pd.Timedelta(1, unit='d')
return (x[floor.between(date, new_date)])
df = df.groupby('subject_id', group_keys=False).apply(func)
print (df)
subject_id time_1 val day
2 1 2173-04-02 12:59:00 0 2
3 1 2173-04-02 13:14:00 0 2
9 2 2173-04-11 04:00:00 0 11
Upvotes: 1