groupby, count past occurences of events, and show the most recent event

Question

How can I group by a unique identifier and count the number of past delinquencies('Bad') and past non delinquencies ('Good') before the most recent event.

For example, given the following dataframe:

ID    Date         Class    
112   2018-02-12    Good
112   2019-01-20    Bad
113   2018-10-11    Bad
113   2019-01-01    Good
113   2020-02-03    Good

This should be the end goal:

ID    Past_deliq  Past_non_deliq  Class   Date
112      0           1             Bad    2019-01-20
113      1           1             Good   2020-02-03

I can get the most recent event by doing the following, df.loc[df.groupby('ID').Date.idxmax()], but I cant find a way to count past occurrences.

Any help is greatly appreciated.

Umar.H · Accepted Answer

Just some basic reshaping and crosstab.

The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.

max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]

df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
    columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)



     Past_deliq  Past_non_deliq       Date
ID                                        
112           0               1 2019-01-20
113           1               1 2020-02-03

groupby, count past occurences of events, and show the most recent event

Answers (2)

Related Questions