Reputation: 33
I have a dataframe like this:
import pandas as pd dict = {'col_a':['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'col_b':['xyz','xyz','xyw','xyw','abc','abe','pqr','pqr']} dt = pd.DataFrame(dict) print(dt) col_a col_b A xyz A xyz A xyw A xyw B abc B ade C pqr C pqr
I want to get all rows where col_a and col_b are repeated, but col_b must not be different even if col_a is the same, some like this:
col_a col_b C pqr C pqr
Notes:
dt[dt.duplicated(subset=['col_a', 'col_b'], keep=False)] col_a col_b A xyz A xyz A xyw A xyw C pqr C pqr
Thank you for your help and attention
Upvotes: 0
Views: 56
Reputation: 323226
Seems like you need
dt[dt.duplicated(keep=False)&(dt.groupby(['col_a'])['col_b'].transform('nunique').eq(1))]
Out[662]:
col_a col_b
6 C pqr
7 C pqr
Upvotes: 3