Reputation: 7644
I have a df
id val1 val2
1 1.1 2.2
1 1.1 2.2
2 2.1 5.5
3 8.8 6.2
4 1.1 2.2
5 8.8 6.2
I want to group by val1 and val2
and get similar dataframe only with rows which has multiple occurrence of same val1 and val2
combination.
Final df
:
id val1 val2
1 1.1 2.2
4 1.1 2.2
3 8.8 6.2
5 8.8 6.2
Upvotes: 40
Views: 80874
Reputation: 23391
Another method is to compute the size of groups and only keep the rows whose group is larger than 1.
msk = df.groupby(['val1', 'val2'])['val1'].transform('size') > 1
df1 = df[msk]
Upvotes: 0
Reputation: 863501
You need duplicated
with parameter subset
for specify columns for check with keep=False
for all duplicates for mask and filter by boolean indexing
:
df = df[df.duplicated(subset=['val1','val2'], keep=False)]
print (df)
id val1 val2
0 1 1.1 2.2
1 1 1.1 2.2
3 3 8.8 6.2
4 4 1.1 2.2
5 5 8.8 6.2
Detail:
print (df.duplicated(subset=['val1','val2'], keep=False))
0 True
1 True
2 False
3 True
4 True
5 True
dtype: bool
Upvotes: 73