Reputation: 3583

How to find out number of unique rows within a pandas groupby object?

I know that we can use .nunique() on a groupby column to find out the unique number of elements in the column like below:

df = pd.DataFrame({'c1':['foo', 'bar', 'foo', 'foo'], 'c2': ['A', 'B', 'A', 'B'], 'c3':[1, 2, 1, 1]})

    c1 c2  c3
0  foo  A   1
1  bar  B   2
2  foo  A   1
3  foo  B   1

df.groupby('c1')['c2'].nunique()

c1
bar    1
foo    2
Name: c2, dtype: int64

However, now I have a groupby object that contains multiple columns, is there any way to find out the number of unique rows?

df.groupby('c1')['c2', 'c3'].???

Update: So the end result I want is the number of unique rows within each group that's grouped based on the 'c1' column, such as this:

foo 2
bar 1

Update 2: Here's a new test dataframe:

df = pd.DataFrame({'c1': ['foo', 'bar', 'foo', 'foo', 'bar'], 'c2': ['A'
, 'B', 'A', 'B', 'A'], 'c3': [1, 2, 1, 1, 1]})

Upvotes: 1

Answers (3)

jezrael

Reputation: 863166

If need by nunique concanecated column c2 and c3, the easier is use:

df['c'] = df.c2 + df.c3.astype(str)
print (df.groupby('c1')['c'].nunique())
c1
bar    1
foo    2
Name: c, dtype: int64

Or groupby by Series c by column df.c1:

c = df.c2.astype(str) + df.c3.astype(str)
print (c.groupby([df.c1]).nunique())
c1
bar    2
foo    2
dtype: int64

Upvotes: 0

George Liu

Reputation: 3583

Finally figured out how to do this!

import pandas as pd
import numpy as np

df = pd.DataFrame({'c1': ['foo', 'bar', 'foo', 'foo', 'bar'],
                   'c2': ['A', 'B', 'A', 'B', 'A'],
                   'c3': [1, 2, 1, 1, 1]})

def check_unique(df):
    return len(df.groupby(list(df.columns.values)))

print(df.groupby('c1').apply(check_unique))

Upvotes: 0

MaxU - stand with Ukraine

Reputation: 210882

UPDATE:

In [131]: df.groupby(['c1','c2','c3']).size().rename('count').reset_index()[['c1','count']].drop_duplicates(subset=['c1'])
Out[131]:
    c1  count
0  bar      1
1  foo      2

OLD answer:

IIYC you need this:

In [43]: df.groupby(['c1','c2','c3']).size()
Out[43]:
c1   c2  c3
bar  B   2     1
foo  A   1     2
     B   1     1
dtype: int64

Upvotes: 1

How to find out number of unique rows within a pandas groupby object?

Answers (3)

Related Questions