Python Pandas - Select duplicated rows where one column does not repeat in another rows

Question

I have a dataframe like this:

    import pandas as pd
    dict = {'col_a':['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'],
       'col_b':['xyz','xyz','xyw','xyw','abc','abe','pqr','pqr']}
    dt = pd.DataFrame(dict)
    print(dt)

    col_a   col_b
    A       xyz
    A       xyz
    A       xyw
    A       xyw
    B       abc
    B       ade
    C       pqr
    C       pqr

I want to get all rows where col_a and col_b are repeated, but col_b must not be different even if col_a is the same, some like this:

    col_a   col_b
    C       pqr
    C       pqr

Notes:

I tried using pandas.DataFrame.duplicated function but the result contains all rows with A and C in col_a:

    dt[dt.duplicated(subset=['col_a', 'col_b'], keep=False)]

    col_a   col_b
    A       xyz
    A       xyz
    A       xyw
    A       xyw
    C       pqr
    C       pqr

Thank you for your help and attention

BENY · Accepted Answer

Seems like you need

dt[dt.duplicated(keep=False)&(dt.groupby(['col_a'])['col_b'].transform('nunique').eq(1))]
Out[662]: 
  col_a col_b
6     C   pqr
7     C   pqr

Python Pandas - Select duplicated rows where one column does not repeat in another rows

Answers (1)

Related Questions