Apook
Apook

Reputation: 449

Running count of rows before a specific date for each group in a dataframe

I have the following Pandas dataframe in Python:

ID      Date
E105    28/4/2021
E105    28/2/2021
E105    23/12/2020
E105    29/11/2020
E076    7/7/2021
E076    20/6/2021
E076    26/5/2021
E076    8/4/2021
E076    3/3/2021
E076    3/2/2021
E076    13/1/2021
E076    23/12/2020
E066    2/6/2021
E066    8/5/2021
E066    8/4/2021
E066    17/1/2021
E066    23/12/2020
E066    2/12/2020
E066    14/11/2020

sorted by ID and Date and I would like to make a new column counting how many times that ID occurs before that day: i.e.:

ID      Date        number of times before that day
E105    28/4/2021   3
E105    28/2/2021   2
E105    23/12/2020  1
E105    29/11/2020  0
E076    7/7/2021    7
E076    20/6/2021   6
E076    26/5/2021   5
E076    8/4/2021    4
E076    3/3/2021    3
E076    3/2/2021    2
E076    13/1/2021   1
E076    23/12/2020  0
E066    2/6/2021    6
E066    8/5/2021    5
E066    8/4/2021    4
E066    17/1/2021   3
E066    23/12/2020  2
E066    2/12/2020   1
E066    14/11/2020  0

I think it has something to do with groupby but I couldn't work it out.

Upvotes: 1

Views: 669

Answers (1)

cottontail
cottontail

Reputation: 23031

As the df is already sorted by Date (in descending order), it suffices to cumulatively count the occurrences of each ID in descending order (which can be achieved via ascending=False parameter).

df['number of times before that day'] = df.groupby('ID').cumcount(ascending=False)

Upvotes: 3

Related Questions