dleal
dleal

Reputation: 2314

Pandas GroupBy date range depending on each row

I would like to do the following:

for a data frame that looks like this:

df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] })

I would like to group by date all those observations that are within 2 days of each other. Then, for example, the first 3 rows would be grouped and the last two would be grouped.

So far, I have thought of using something similar to:

df.groupby(df['date'].map(lambda x: x.month))

What is the general way to do this type of "fuzzy groupby"?

thank you,

Upvotes: 2

Views: 3831

Answers (1)

unutbu
unutbu

Reputation: 879113

You could sort the rows by date, then take the difference between consecutive dates. Test when the difference is greater than 2 days. Taking the cumulative sum assigns the desired group numbers:

import pandas as pd
df = pd.DataFrame({"ID":["A", "A", "C" ,"B", "B"], "date":["06/24/2014","06/25/2014","06/23/2014","07/02/1999","07/02/1999"], "value": ["3","5","1","7","8"] })
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df['group'] = (df['date'].diff() > pd.Timedelta(days=2)).cumsum()
print(df)

yields

  ID       date value  group
3  B 1999-07-02     7      0
4  B 1999-07-02     8      0
2  C 2014-06-23     1      1
0  A 2014-06-24     3      1
1  A 2014-06-25     5      1

Upvotes: 8

Related Questions