Ronnie
Ronnie

Reputation: 391

Get only matching rows for groups in Pandas groupby

I have the following df:

d = {"Col1":['a','d','b','c','a','d','b','c'],
"Col2":['x','y','x','z','x','y','z','y'],
"Col3":['n','m','m','l','m','m','l','l'],
"Col4":[1,4,2,2,1,4,2,2]}

df = pd.DataFrame(d)

When I groupby on three fields, I get the result:

gb = df.groupby(['Col1', 'Col2', 'Col3'])['Col4'].agg(['sum', 'mean'])

enter image description here

How can I extract only the groups and rows where a row of a group matches with at least one other row of another group on grouped columns. Please see the picture below, I want to get the highlighted rows

enter image description here

I want to get the rows in red on the basis of the ones in Blue and Black which match eachother

Apologies if my statement is ambiguous. Any help would be appreciated

Upvotes: 1

Views: 1441

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

You can reset_index then use duplicated and boolean index filter your dataframe:

gb = gb.reset_index()
gb[gb.duplicated(subset=['Col2','Col3'], keep=False)]

Output:

  Col1 Col2 Col3  sum  mean
0    a    x    m    1     1
2    b    x    m    2     2
3    b    z    l    2     2
5    c    z    l    2     2

Upvotes: 2

BramV
BramV

Reputation: 556

Make a table with all allowed combinations and then inner join it with this dataframe.

Upvotes: 0

Related Questions