Reputation: 1960
I have a dataframe:
df = [type1 , type2 , type3 , val1, val2, val3
a b q 1 2 3
a c w 3 5 2
b c t 2 9 0
a b p 4 6 7
a c m 2 1 8
a b h 8 6 3
a b e 4 2 7]
I want to apply groupby based on columns type1, type2 and delete from the dataframe the groups with more than 2 rows. So the new dataframe will be:
df = [type1 , type2 , type3 , val1, val2, val3
a c w 3 5 2
b c t 2 9 0
a c m 2 1 8
]
What is the best way to do so?
Upvotes: 3
Views: 1291
Reputation: 862681
Use GroupBy.transform
for get counts of groups for Series
with same size like original, so possible filter by Series.le
for <=
in boolean indexing
:
df = df[df.groupby(['type1','type2'])['type1'].transform('size').le(2)]
print (df)
type1 type2 type3 val1 val2 val3
1 a c w 3 5 2
2 b c t 2 9 0
4 a c m 2 1 8
If performace is not important or small DataFrame is possible use DataFrameGroupBy.filter
:
df =df.groupby(['type1','type2']).filter(lambda x: len(x) <= 2)
Upvotes: 5