Reputation: 17154
This must be a simple question, but, however, it is bugging my head for a while.
For a dataframe below:
df = pd.DataFrame({'c0': ['a','b','a'],'c1': ['a','bb','a'],'c2':[10,20,30]})
c0 c1 c2
0 a a 10
1 b bb 20
2 a a 30
How to get output where count > 1?
I have tried:
df.groupby(['c0','c1'])['c2'].count()
c0 c1
a a 2
b bb 1
Required is:
c0 c1
a a 2
I am looking other than
x = df.groupby(['c0','c1'])['c2'].count()
x[x>1]
i.e. a one-liner answer.
Upvotes: 18
Views: 32301
Reputation: 17154
Here is the one-liner:
In [8]: df.groupby(['c0','c1'])['c2'].count().pipe(lambda dfx: dfx.loc[dfx>1])
Out[8]:
c0 c1
a a 2
Name: c2, dtype: int64
Upvotes: 1
Reputation: 862701
Use GroupBy.transform
for Series with same size like original DataFrame:
df1 = df[df.groupby(['c0','c1'])['c2'].transform('count') > 1]
Or use DataFrame.duplicated
for filtered all dupe rows by specified columns in list:
df1 = df[df.duplicated(['c0','c1'], keep=False)]
If performance is in not important or small DataFrame use DataFrameGroupBy.filter
:
df1 = df.groupby(['c0','c1']).filter(lambda x: len(x) > 1)
Upvotes: 42