Reputation: 449
I have the following Pandas dataframe in Python:
ID Date
E105 28/4/2021
E105 28/2/2021
E105 23/12/2020
E105 29/11/2020
E076 7/7/2021
E076 20/6/2021
E076 26/5/2021
E076 8/4/2021
E076 3/3/2021
E076 3/2/2021
E076 13/1/2021
E076 23/12/2020
E066 2/6/2021
E066 8/5/2021
E066 8/4/2021
E066 17/1/2021
E066 23/12/2020
E066 2/12/2020
E066 14/11/2020
sorted by ID
and Date
and I would like to make a new column counting how many times that ID occurs before that day: i.e.:
ID Date number of times before that day
E105 28/4/2021 3
E105 28/2/2021 2
E105 23/12/2020 1
E105 29/11/2020 0
E076 7/7/2021 7
E076 20/6/2021 6
E076 26/5/2021 5
E076 8/4/2021 4
E076 3/3/2021 3
E076 3/2/2021 2
E076 13/1/2021 1
E076 23/12/2020 0
E066 2/6/2021 6
E066 8/5/2021 5
E066 8/4/2021 4
E066 17/1/2021 3
E066 23/12/2020 2
E066 2/12/2020 1
E066 14/11/2020 0
I think it has something to do with groupby
but I couldn't work it out.
Upvotes: 1
Views: 669
Reputation: 23031
As the df is already sorted by Date (in descending order), it suffices to cumulatively count the occurrences of each ID in descending order (which can be achieved via ascending=False
parameter).
df['number of times before that day'] = df.groupby('ID').cumcount(ascending=False)
Upvotes: 3