BhishanPoudel
BhishanPoudel

Reputation: 17154

Pandas groupby take counts greater than 1

This must be a simple question, but, however, it is bugging my head for a while.

For a dataframe below:

df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
  c0  c1  c2
0  a   a  10
1  b  bb  20
2  a   a  30

How to get output where count > 1?

I have tried:

df.groupby(['c0','c1'])['c2'].count()
c0  c1
a   a     2
b   bb    1

Required is:

c0  c1
a   a     2

I am looking other than

x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]

i.e. a one-liner answer.

Upvotes: 18

Views: 32301

Answers (2)

BhishanPoudel
BhishanPoudel

Reputation: 17154

Here is the one-liner:

In [8]: df.groupby(['c0','c1'])['c2'].count().pipe(lambda  dfx: dfx.loc[dfx>1])
Out[8]: 
c0  c1
a   a     2
Name: c2, dtype: int64

Upvotes: 1

jezrael
jezrael

Reputation: 862701

Use GroupBy.transform for Series with same size like original DataFrame:

df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]

Or use DataFrame.duplicated for filtered all dupe rows by specified columns in list:

df1 = df[df.duplicated(['c0','c1'], keep=False)]

If performance is in not important or small DataFrame use DataFrameGroupBy.filter:

df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)

Upvotes: 42

Related Questions