Monica Heddneck
Monica Heddneck

Reputation: 3125

Group by and sum in Pandas

Say I have a dataframe like such:

df= { 'class': ['A','A','A','A','A','B','B','B','B'],
         'ID':[1,2,2,3,3,4,4,4,5]}  

Class ID
A     1 
A     2 
A     2 
A     3 
A     3 
B     4 
B     4 
B     4 
B     5 

I'd like to summarize the data as such:

Class  count(distinct(ID))
A      3
B      2

I know this is pretty trivial but I've gotten stuck here:

df.groupby(by=['Class', 'ID']).count()

which gives me

enter image description here

I can't seem to get the summation after the group by for some reason. Thanks.

Upvotes: 1

Views: 751

Answers (2)

Andy Hayden
Andy Hayden

Reputation: 375485

I think you're looking for nunique:

In [11]: df.groupby("Class")["ID"].nunique()
Out[11]:
Class
A    3
B    2
Name: ID, dtype: int64

Upvotes: 2

attitude_stool
attitude_stool

Reputation: 1023

(df[['Class','ID']]
 .drop_duplicates()
 .groupby('Class')
 .count())

Upvotes: 1

Related Questions