Dropping selected rows in Pandas with duplicated columns

Question

Suppose I have a dataframe like this:

fname    lname     email

Joe      Aaron   
Joe      Aaron     some@some.com
Bill     Smith 
Bill     Smith
Bill     Smith     some2@some.com

Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank?

jpp · Accepted Answer

You should first check whether your "empty" data is NaN or empty strings. If they are a mixture, you may need to modify the below logic.

df = df.sort_values('email')\
       .drop_duplicates(['fname', 'lname'])

If your empty rows are strings, you need to specify ascending=False when sorting:

df = df.sort_values('email', ascending=False)\
       .drop_duplicates(['fname', 'lname'])

print(df)

  fname  lname           email
4  Bill  Smith  some2@some.com
1   Joe  Aaron   some@some.com

Answers (2)