Pandas groupby take counts greater than 1

Question

This must be a simple question, but, however, it is bugging my head for a while.

For a dataframe below:

df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
  c0  c1  c2
0  a   a  10
1  b  bb  20
2  a   a  30

How to get output where count > 1?

I have tried:

df.groupby(['c0','c1'])['c2'].count()
c0  c1
a   a     2
b   bb    1

Required is:

c0  c1
a   a     2

I am looking other than

x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]

i.e. a one-liner answer.

jezrael · Accepted Answer

Use GroupBy.transform for Series with same size like original DataFrame:

df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]

Or use DataFrame.duplicated for filtered all dupe rows by specified columns in list:

df1 = df[df.duplicated(['c0','c1'], keep=False)]

If performance is in not important or small DataFrame use DataFrameGroupBy.filter:

df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)

Answers (2)