Reputation: 7733
I have a dataframe like as shown below
ID subject_id
3403 1
3478 1
3478 1
3478 1
3478 1
3478 1
3478 1
3481 1
3481 1
3481 3
3481 3
I would like to get the size()
as well as no of unique subjects under each ID
So, I tried the below
df['s_cnt'] = df.groupby(['Id']).subject_id.nunique()
df['r_cnt'] = df.groupby(['Id']).size()
Is there anyway to do both the operations in a single line?
Any elegant approach would be helpful as I have to apply this on big data
I expect my output to be like as shown below
Id s_cnt r_cnt
3403 1 1
3478 1 6
3481 2 4
Upvotes: 1
Views: 581
Reputation: 863166
Use named aggregation, for function size
is ppossible pass any column, e.g. here subject_id
or ID
, but for nunique
is necessary pass column used for test number of unique values - here subject_id
:
df1 = df.groupby('ID', as_index=False).agg(s_cnt = ('subject_id', 'nunique'),
r_cnt= ('subject_id', 'size'))
Or:
df1 = df.groupby('ID', as_index=False).agg(s_cnt = ('subject_id', 'nunique'),
r_cnt= ('ID', 'size'))
print (df1)
ID s_cnt r_cnt
0 3403 1 1
1 3478 1 6
2 3481 2 4
EDIT: For oldier pandas versiond use reset_index
instead as_index=False
:
df1 = df.groupby('ID').agg(s_cnt = ('subject_id', 'nunique'),
r_cnt= ('subject_id', 'size')).reset_index()
Upvotes: 3