UserYmY
UserYmY

Reputation: 8564

python pandas: Simplest way to perform group by and extract count of unique entities?

my df:

nr,name
1,sam
2,sam
1,mar
1,sam
2,tom
2,jack
1,mar

How can I group by 'nr' and count distinct names in 'name' column? this must be a very easy command in all languages like mysql(groupby and distinct command) but I cannot find this in pandas. Can anybody help?

Upvotes: 0

Views: 203

Answers (2)

Zero
Zero

Reputation: 77007

Use nunique()

In [13]: df.groupby('nr')['name'].nunique()
Out[13]:
nr
1     2
2     3

Alternatively, use pd.Series.nunique

In [14]: df.groupby('nr').agg({'name': pd.Series.nunique})
Out[14]:
    name
nr
1      2
2      3

Also, you could use nunique() in agg()

In [15]: df.groupby('nr').agg({'name': lambda x: x.nunique()})
Out[15]:
    name
nr
1      2
2      3

Interestingly, at times, I noticed len(x.unique()) is much faster than above methods.

In [16]: df.groupby('nr').agg({'name': lambda x: len(x.unique())})
Out[16]:
    name
nr
1      2
2      3

Upvotes: 1

EdChum
EdChum

Reputation: 394279

You want nunique:

In [4]:

df.groupby('nr')['name'].nunique()
Out[4]:
nr
1    2
2    3
Name: name, dtype: int64

So this returns the number of unique values for that column grouped by 'nr'

Upvotes: 3

Related Questions