Reputation: 3125
Say I have a dataframe like such:
df= { 'class': ['A','A','A','A','A','B','B','B','B'],
'ID':[1,2,2,3,3,4,4,4,5]}
Class ID
A 1
A 2
A 2
A 3
A 3
B 4
B 4
B 4
B 5
I'd like to summarize the data as such:
Class count(distinct(ID))
A 3
B 2
I know this is pretty trivial but I've gotten stuck here:
df.groupby(by=['Class', 'ID']).count()
which gives me
I can't seem to get the summation after the group by for some reason. Thanks.
Upvotes: 1
Views: 751
Reputation: 375485
I think you're looking for nunique
:
In [11]: df.groupby("Class")["ID"].nunique()
Out[11]:
Class
A 3
B 2
Name: ID, dtype: int64
Upvotes: 2
Reputation: 1023
(df[['Class','ID']]
.drop_duplicates()
.groupby('Class')
.count())
Upvotes: 1