The Great
The Great

Reputation: 7723

How to get previous and middle dates in each group using pandas

I have the following DataFrame:

df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,2,2,2,2,2],
'time_1' :['2173/04/01 12:35:00','2173/04/01 12:50:00','2173/04/02 12:59:00','2173/04/02 13:14:00','2173/04/04 13:37:00','2173/04/06 13:39:00','2173/04/06 11:30:00','2173/04/08 16:00:00','2173/04/09 22:00:00','2173/04/11 04:00:00','2173/04/13 04:30:00','2173/04/14 08:00:00'],
 'val' :[0,0,0,0,1,0,0,0,0,0,0,0]
})
df['time_1'] = pd.to_datetime(df['time_1'])
df['day'] = df['time_1'].dt.day

enter image description here

What I would like to do is, for each subject, get two days, based on a condition given below

val = 1- Fetch previous two days (from val = 1)

val = 0- Fetch middle two days of his duration

This is what I tried, but it isn't elegant or accurate.

con = lambda x: (x.eq(1))
con_1 = lambda x: (x.eq(0))
df.loc[df.groupby('subject_id')['val'].transform(con)]['time_1'] - timedelta(days = 2)

Following is my expected output:

enter image description here

subject = 1 has val = 1, so we get previous two days from val = 1 for him and for subject = 2, as there is no val = 1, we get middle 2 days of his duration (8th - 14th)

Upvotes: 2

Views: 172

Answers (1)

jezrael
jezrael

Reputation: 863031

I believe you need:

def func(x):
    floor = x['time_1'].dt.floor('d')

    if (x['val'].any()):
        date = x.loc[x['val'] == 1, 'time_1'].iat[0].floor('d')
        new_date = date - pd.Timedelta(3, unit='d')
        return (x[floor.between(new_date, date, inclusive=False)])
    else:
        date = floor.mean().floor('d')
        new_date = date + pd.Timedelta(1, unit='d')
        return (x[floor.between(date, new_date)])

df = df.groupby('subject_id', group_keys=False).apply(func)
print (df)
   subject_id              time_1  val  day
2           1 2173-04-02 12:59:00    0    2
3           1 2173-04-02 13:14:00    0    2
9           2 2173-04-11 04:00:00    0   11

Upvotes: 1

Related Questions