Reputation: 351
Would like to know how I can identify consecutive date periods with the period counter resetting per group.
This is my stab at it which does it across the whole DataFrame but can't workout how to do it per group.
import pandas as pd
data = {
"peoples": ["jimbob","jimbob","jimbob", "jimbob","jimbob","jimbob", "sonnyjim","sonnyjim","sonnyjim","sonnyjim"],
"dates": ["2020-11-01","2020-11-02","2020-11-03","2020-11-06","2020-11-09","2020-11-10", "2020-11-12","2020-11-13","2020-11-20","2020-11-22"]
}
df = pd.DataFrame(data)
df["dates"] = pd.to_datetime(df["dates"])
df["period"] = df["dates"].diff().dt.days.ne(1).cumsum()
print(df)
I'd like to be able to do something like this:
peoples dates period
0 jimbob 2020-11-01 1
1 jimbob 2020-11-02 1
2 jimbob 2020-11-03 1
3 jimbob 2020-11-06 2
4 jimbob 2020-11-09 3
5 jimbob 2020-11-10 3
6 sonnyjim 2020-11-12 1
7 sonnyjim 2020-11-13 1
8 sonnyjim 2020-11-20 2
9 sonnyjim 2020-11-22 3
Upvotes: 1
Views: 57
Reputation: 71689
You can groupby
the dataframe on peoples
then apply a custom lambda
function on dates
to calculate the consecutive date blocks:
f = lambda s: s.diff().dt.days.ne(1).cumsum()
df['period'] = df.groupby('peoples')['dates'].apply(f)
peoples dates period
0 jimbob 2020-11-01 1
1 jimbob 2020-11-02 1
2 jimbob 2020-11-03 1
3 jimbob 2020-11-06 2
4 jimbob 2020-11-09 3
5 jimbob 2020-11-10 3
6 sonnyjim 2020-11-12 1
7 sonnyjim 2020-11-13 1
8 sonnyjim 2020-11-20 2
9 sonnyjim 2020-11-22 3
Upvotes: 2