python pandas: Simplest way to perform group by and extract count of unique entities?

Question

my df:

nr,name
1,sam
2,sam
1,mar
1,sam
2,tom
2,jack
1,mar

How can I group by 'nr' and count distinct names in 'name' column? this must be a very easy command in all languages like mysql(groupby and distinct command) but I cannot find this in pandas. Can anybody help?

Zero · Accepted Answer

Use nunique()

In [13]: df.groupby('nr')['name'].nunique()
Out[13]:
nr
1     2
2     3

Alternatively, use pd.Series.nunique

In [14]: df.groupby('nr').agg({'name': pd.Series.nunique})
Out[14]:
    name
nr
1      2
2      3

Also, you could use nunique() in agg()

In [15]: df.groupby('nr').agg({'name': lambda x: x.nunique()})
Out[15]:
    name
nr
1      2
2      3

Interestingly, at times, I noticed len(x.unique()) is much faster than above methods.

In [16]: df.groupby('nr').agg({'name': lambda x: len(x.unique())})
Out[16]:
    name
nr
1      2
2      3

python pandas: Simplest way to perform group by and extract count of unique entities?

Answers (2)

Related Questions