Reputation: 2189
I have a data frame like this,
df
col1 col2 col3
A 2021-02-01 P
B 2021-02-12 P
C 2021-02-08 Q
A 2021-02-04 Q
B 2021-02-14 Q
A 2021-02-15 S
The col2 is pandas datetime type. Now I want to group col3 values on col1 and col2(with date date range of +- 4 days, ex: col3 values between 2021-02-01 and 2021-02-04 should be grouped against col1 values)
So the final data frame should look like,
col1 col3
A [P,Q]
B [P,Q]
C [Q]
A [S]
This could be done using a for loop and checking the date time differences but the execution time will be huge, so looking for pandas shortcuts to do this more efficiently.
Upvotes: 0
Views: 585
Reputation: 862441
If need defined groups by each 4 days starting by first day of group use:
First create helper column for subtract dates by first value per groups by integer division of 4
and then use it for aggregate lists:
df['g'] = df['col2'].sub(df.groupby('col1')['col2'].transform('first')).dt.days // 4
df=df.groupby(['g','col1'])['col3'].agg(list).reset_index(level=0,drop=True).reset_index()
print (df)
col1 col3
0 A [P, Q]
1 B [P, Q]
2 C [Q]
3 A [S]
Upvotes: 2