Reputation: 8564
my df:
nr,name
1,sam
2,sam
1,mar
1,sam
2,tom
2,jack
1,mar
How can I group by 'nr' and count distinct names in 'name' column? this must be a very easy command in all languages like mysql(groupby and distinct command) but I cannot find this in pandas. Can anybody help?
Upvotes: 0
Views: 203
Reputation: 77007
Use nunique()
In [13]: df.groupby('nr')['name'].nunique()
Out[13]:
nr
1 2
2 3
Alternatively, use pd.Series.nunique
In [14]: df.groupby('nr').agg({'name': pd.Series.nunique})
Out[14]:
name
nr
1 2
2 3
Also, you could use nunique()
in agg()
In [15]: df.groupby('nr').agg({'name': lambda x: x.nunique()})
Out[15]:
name
nr
1 2
2 3
Interestingly, at times, I noticed len(x.unique())
is much faster than above methods.
In [16]: df.groupby('nr').agg({'name': lambda x: len(x.unique())})
Out[16]:
name
nr
1 2
2 3
Upvotes: 1