Identify consecutive date periods per group

Question

Would like to know how I can identify consecutive date periods with the period counter resetting per group.

This is my stab at it which does it across the whole DataFrame but can't workout how to do it per group.

import pandas as pd

data = {
    "peoples": ["jimbob","jimbob","jimbob", "jimbob","jimbob","jimbob", "sonnyjim","sonnyjim","sonnyjim","sonnyjim"],
    "dates": ["2020-11-01","2020-11-02","2020-11-03","2020-11-06","2020-11-09","2020-11-10", "2020-11-12","2020-11-13","2020-11-20","2020-11-22"]
}

df = pd.DataFrame(data)
df["dates"] = pd.to_datetime(df["dates"])

df["period"] = df["dates"].diff().dt.days.ne(1).cumsum()

print(df)

I'd like to be able to do something like this:

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

Shubham Sharma · Accepted Answer

You can groupby the dataframe on peoples then apply a custom lambda function on dates to calculate the consecutive date blocks:

f = lambda s: s.diff().dt.days.ne(1).cumsum()
df['period'] = df.groupby('peoples')['dates'].apply(f)

    peoples      dates  period
0    jimbob 2020-11-01       1
1    jimbob 2020-11-02       1
2    jimbob 2020-11-03       1
3    jimbob 2020-11-06       2
4    jimbob 2020-11-09       3
5    jimbob 2020-11-10       3
6  sonnyjim 2020-11-12       1
7  sonnyjim 2020-11-13       1
8  sonnyjim 2020-11-20       2
9  sonnyjim 2020-11-22       3

Identify consecutive date periods per group

Answers (1)

Related Questions