Oaty
Oaty

Reputation: 351

Identify consecutive date periods per group

Would like to know how I can identify consecutive date periods with the period counter resetting per group.

This is my stab at it which does it across the whole DataFrame but can't workout how to do it per group.

import pandas as pd

data = {
    "peoples": ["jimbob","jimbob","jimbob", "jimbob","jimbob","jimbob", "sonnyjim","sonnyjim","sonnyjim","sonnyjim"],
    "dates": ["2020-11-01","2020-11-02","2020-11-03","2020-11-06","2020-11-09","2020-11-10", "2020-11-12","2020-11-13","2020-11-20","2020-11-22"]
}

df = pd.DataFrame(data)
df["dates"] = pd.to_datetime(df["dates"])

df["period"] = df["dates"].diff().dt.days.ne(1).cumsum()

print(df)

I'd like to be able to do something like this:

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

Upvotes: 1

Views: 57

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can groupby the dataframe on peoples then apply a custom lambda function on dates to calculate the consecutive date blocks:

f = lambda s: s.diff().dt.days.ne(1).cumsum()
df['period'] = df.groupby('peoples')['dates'].apply(f)

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

Upvotes: 2

Related Questions