Reputation: 701
I have the dataframe that looks like follows :
A_key t_num r_num o_st
292 A1 16 TS Q
812 A1 16 TS Q
173 A5 15 TT Q
955 A7 19 TQ Q
804 A7 17 TR Q
49 A7 18 TQ Q
668 A1 28 None P
788 A4 88 None P
693 A7 17 TS Q
781 A1 22 None P
I want to select all distinct rows for a unique combination of A_key and t_num for o_st = 'P' and all distinct rows for a unique combination of A_key and r_num for o_st = 'Q' into a new dataframe. Appreciate any responses
Expected df looks like the following
A_key t_num r_num o_st
292 A1 16 TS Q
173 A5 15 TT Q
955 A7 19 TQ Q
804 A7 17 TR Q
668 A1 28 None P
788 A4 88 None P
693 A7 17 TS Q
781 A1 22 None P
Upvotes: 1
Views: 49
Reputation: 51165
where
+ drop_duplicates
df['key'] = df['t_num'].where(df['o_st'].eq('P'), df['r_num'])
df.drop_duplicates(subset=['A_key', 'key']).drop('key', 1)
A_key t_num r_num o_st
292 A1 16 TS Q
173 A5 15 TT Q
955 A7 19 TQ Q
804 A7 17 TR Q
668 A1 28 None P
788 A4 88 None P
693 A7 17 TS Q
781 A1 22 None P
You have two distinct subsets that you want to check for uniqueness depending on the value of the third column. Instead of calling drop_duplicates
twice, I create a mask that contains the correct subset value before calling drop_duplicates
a single time.
Upvotes: 1