Group by and sum in Pandas

Question

Say I have a dataframe like such:

df= { 'class': ['A','A','A','A','A','B','B','B','B'],
         'ID':[1,2,2,3,3,4,4,4,5]}  

Class ID
A     1 
A     2 
A     2 
A     3 
A     3 
B     4 
B     4 
B     4 
B     5

I'd like to summarize the data as such:

Class  count(distinct(ID))
A      3
B      2

I know this is pretty trivial but I've gotten stuck here:

df.groupby(by=['Class', 'ID']).count()

which gives me

I can't seem to get the summation after the group by for some reason. Thanks.

Andy Hayden · Accepted Answer

I think you're looking for nunique:

In [11]: df.groupby("Class")["ID"].nunique()
Out[11]:
Class
A    3
B    2
Name: ID, dtype: int64

Group by and sum in Pandas

Answers (2)

Related Questions