The Great
The Great

Reputation: 7733

Elegant way to get size and unique count using pandas groupby

I have a dataframe like as shown below

ID     subject_id
3403       1
3478       1
3478       1
3478       1
3478       1
3478       1
3478       1
3481       1
3481       1
3481       3
3481       3

I would like to get the size() as well as no of unique subjects under each ID

So, I tried the below

df['s_cnt'] = df.groupby(['Id']).subject_id.nunique()
df['r_cnt'] = df.groupby(['Id']).size()

Is there anyway to do both the operations in a single line?

Any elegant approach would be helpful as I have to apply this on big data

I expect my output to be like as shown below

 Id   s_cnt   r_cnt
3403    1       1
3478    1       6
3481    2       4

Upvotes: 1

Views: 581

Answers (1)

jezrael
jezrael

Reputation: 863166

Use named aggregation, for function size is ppossible pass any column, e.g. here subject_id or ID, but for nunique is necessary pass column used for test number of unique values - here subject_id:

df1 = df.groupby('ID', as_index=False).agg(s_cnt = ('subject_id', 'nunique'),
                                           r_cnt= ('subject_id', 'size'))

Or:

df1 = df.groupby('ID', as_index=False).agg(s_cnt = ('subject_id', 'nunique'),
                                           r_cnt= ('ID', 'size'))

print (df1)
     ID  s_cnt  r_cnt
0  3403      1      1
1  3478      1      6
2  3481      2      4

EDIT: For oldier pandas versiond use reset_index instead as_index=False:

df1 = df.groupby('ID').agg(s_cnt = ('subject_id', 'nunique'),
                           r_cnt= ('subject_id', 'size')).reset_index()

Upvotes: 3

Related Questions