iprof0214
iprof0214

Reputation: 701

select all duplicates on two different column combinations based on conditions

I have the dataframe that looks like follows :

     A_key     t_num    r_num  o_st 
292  A1         16       TS      Q
812  A1         16       TS      Q
173  A5         15       TT      Q
955  A7         19       TQ      Q
804  A7         17       TR      Q
49   A7         18       TQ      Q
668  A1         28       None    P 
788  A4         88       None    P 
693  A7         17       TS      Q
781  A1         22       None    P

I want to select all distinct rows for a unique combination of A_key and t_num for o_st = 'P' and all distinct rows for a unique combination of A_key and r_num for o_st = 'Q' into a new dataframe. Appreciate any responses

Expected df looks like the following

   A_key     t_num    r_num  o_st 
292  A1         16       TS      Q
173  A5         15       TT      Q
955  A7         19       TQ      Q
804  A7         17       TR      Q
668  A1         28       None    P 
788  A4         88       None    P 
693  A7         17       TS      Q
781  A1         22       None    P

Upvotes: 1

Views: 49

Answers (1)

user3483203
user3483203

Reputation: 51165

where + drop_duplicates

df['key'] = df['t_num'].where(df['o_st'].eq('P'), df['r_num'])

df.drop_duplicates(subset=['A_key', 'key']).drop('key', 1)

    A_key  t_num r_num o_st
292    A1     16    TS    Q
173    A5     15    TT    Q
955    A7     19    TQ    Q
804    A7     17    TR    Q
668    A1     28  None    P
788    A4     88  None    P
693    A7     17    TS    Q
781    A1     22  None    P

You have two distinct subsets that you want to check for uniqueness depending on the value of the third column. Instead of calling drop_duplicates twice, I create a mask that contains the correct subset value before calling drop_duplicates a single time.

Upvotes: 1

Related Questions