Kallol
Kallol

Reputation: 2189

Pandas group by date range with multiple columns

I have a data frame like this,

df
col1    col2          col3
 A      2021-02-01     P
 B      2021-02-12     P
 C      2021-02-08     Q
 A      2021-02-04     Q
 B      2021-02-14     Q
 A      2021-02-15     S

The col2 is pandas datetime type. Now I want to group col3 values on col1 and col2(with date date range of +- 4 days, ex: col3 values between 2021-02-01 and 2021-02-04 should be grouped against col1 values)

So the final data frame should look like,

col1    col3
 A      [P,Q]
 B      [P,Q]
 C      [Q]
 A      [S]

This could be done using a for loop and checking the date time differences but the execution time will be huge, so looking for pandas shortcuts to do this more efficiently.

Upvotes: 0

Views: 585

Answers (1)

jezrael
jezrael

Reputation: 862441

If need defined groups by each 4 days starting by first day of group use:

First create helper column for subtract dates by first value per groups by integer division of 4 and then use it for aggregate lists:

df['g'] = df['col2'].sub(df.groupby('col1')['col2'].transform('first')).dt.days // 4

df=df.groupby(['g','col1'])['col3'].agg(list).reset_index(level=0,drop=True).reset_index()
print (df)
  col1    col3
0    A  [P, Q]
1    B  [P, Q]
2    C     [Q]
3    A     [S]

Upvotes: 2

Related Questions