Reputation: 473
How can I group by a unique identifier and count the number of past delinquencies('Bad') and past non delinquencies ('Good') before the most recent event.
For example, given the following dataframe:
ID Date Class
112 2018-02-12 Good
112 2019-01-20 Bad
113 2018-10-11 Bad
113 2019-01-01 Good
113 2020-02-03 Good
This should be the end goal:
ID Past_deliq Past_non_deliq Class Date
112 0 1 Bad 2019-01-20
113 1 1 Good 2020-02-03
I can get the most recent event by doing the following, df.loc[df.groupby('ID').Date.idxmax()]
, but I cant find a way to count past occurrences.
Any help is greatly appreciated.
Upvotes: 0
Views: 389
Reputation: 16375
b=df.groupby(["ID","Class"])["Class"].count().unstack()
You groupby both the ID and the Class which means you will get the count of each class for each ID. Than you call unstack which takes the left most labels from the index and inserts them as columns.
After you make another groupby by which you determine the last occurence(this solution has the assumption your data is ordered by date, if not use function max).
c=df.groupby("ID").agg({"Date":"last","Class":"last"})
After you merger the two dataframes.
b.merge(c, on="ID")
And you get what you requested.
Upvotes: 1
Reputation: 23099
Just some basic reshaping and crosstab
.
The idea is to filter your dataframe by values that aren't the max, do a values count aggregation and re-join your dataframe with the max dates.
max_date = df.groupby('ID')['Date'].max()
s1 = df.loc[~df.index.isin(df.groupby("ID")["Date"].idxmax())]
df1 = pd.crosstab(s1.ID, s1.Class).join(max_date).rename(
columns={"Bad": "Past_deliq", "Good": "Past_non_deliq"}
)
Past_deliq Past_non_deliq Date
ID
112 0 1 2019-01-20
113 1 1 2020-02-03
Upvotes: 1