Reputation: 3583
I know that we can use .nunique() on a groupby column to find out the unique number of elements in the column like below:
df = pd.DataFrame({'c1':['foo', 'bar', 'foo', 'foo'], 'c2': ['A', 'B', 'A', 'B'], 'c3':[1, 2, 1, 1]})
c1 c2 c3
0 foo A 1
1 bar B 2
2 foo A 1
3 foo B 1
df.groupby('c1')['c2'].nunique()
c1
bar 1
foo 2
Name: c2, dtype: int64
However, now I have a groupby object that contains multiple columns, is there any way to find out the number of unique rows?
df.groupby('c1')['c2', 'c3'].???
Update: So the end result I want is the number of unique rows within each group that's grouped based on the 'c1' column, such as this:
foo 2
bar 1
Update 2: Here's a new test dataframe:
df = pd.DataFrame({'c1': ['foo', 'bar', 'foo', 'foo', 'bar'], 'c2': ['A'
, 'B', 'A', 'B', 'A'], 'c3': [1, 2, 1, 1, 1]})
Upvotes: 1
Views: 311
Reputation: 863166
If need by nunique
concanecated column c2
and c3
, the easier is use:
df['c'] = df.c2 + df.c3.astype(str)
print (df.groupby('c1')['c'].nunique())
c1
bar 1
foo 2
Name: c, dtype: int64
Or groupby
by Series
c
by column df.c1
:
c = df.c2.astype(str) + df.c3.astype(str)
print (c.groupby([df.c1]).nunique())
c1
bar 2
foo 2
dtype: int64
Upvotes: 0
Reputation: 3583
Finally figured out how to do this!
import pandas as pd
import numpy as np
df = pd.DataFrame({'c1': ['foo', 'bar', 'foo', 'foo', 'bar'],
'c2': ['A', 'B', 'A', 'B', 'A'],
'c3': [1, 2, 1, 1, 1]})
def check_unique(df):
return len(df.groupby(list(df.columns.values)))
print(df.groupby('c1').apply(check_unique))
Upvotes: 0
Reputation: 210882
UPDATE:
In [131]: df.groupby(['c1','c2','c3']).size().rename('count').reset_index()[['c1','count']].drop_duplicates(subset=['c1'])
Out[131]:
c1 count
0 bar 1
1 foo 2
OLD answer:
IIYC you need this:
In [43]: df.groupby(['c1','c2','c3']).size()
Out[43]:
c1 c2 c3
bar B 2 1
foo A 1 2
B 1 1
dtype: int64
Upvotes: 1