Reputation: 196
Q1. Given data frame 1, I am trying to get group-by unique new occurrences & another column that gives me existing ID count per month
ID Date
1 Jan-2020
2 Feb-2020
3 Feb-2020
1 Mar-2020
2 Mar-2020
3 Mar-2020
4 Apr-2020
5 Apr-2020
Expected output for unique newly added group-by ID values & for existing sum of ID values
Date ID_Count Existing_count
Jan-2020 1 0
Feb-2020 2 1
Mar-2020 0 3
Apr-2020 2 3
Note: Mar-2020 ID_Count is ZERO because ID 1, 2, and 3 were present in previous months.
Note: Existing count is 0 for Jan-2020 because there were zero IDs before Jan. The existing count for Feb-2020 is 1 because before Feb there was only 1. Mar-2020 has 3 existing counts as it adds Jan + Feb and so on
Upvotes: 2
Views: 108
Reputation: 153520
I think you can do it like this:
df['month'] = pd.to_datetime(df['Date'], format='%b-%Y')
# Find new IDs
df['new'] = df.groupby('ID').cumcount()==0
# Count new IDs by month
df_ct = df.groupby('month')['new'].sum().to_frame(name='ID_Count')
# Count all previous new IDs
df_ct['Existing_cnt'] = df_ct['ID_Count'].shift().cumsum().fillna(0).astype(int)
df_ct.index = df_ct.index.strftime('%b-%Y')
df_ct
Output:
ID_Count Existing_cnt
month
Jan-2020 1 0
Feb-2020 2 1
Mar-2020 0 3
Apr-2020 2 3
Upvotes: 4