Reputation: 2269
Suppose I have a dataframe like this:
fname lname email
Joe Aaron
Joe Aaron [email protected]
Bill Smith
Bill Smith
Bill Smith [email protected]
Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank?
Upvotes: 0
Views: 68
Reputation: 164613
You should first check whether your "empty" data is NaN
or empty strings. If they are a mixture, you may need to modify the below logic.
Using pd.DataFrame.sort_values
and pd.DataFrame.drop_duplicates
:
df = df.sort_values('email')\
.drop_duplicates(['fname', 'lname'])
If your empty rows are strings, you need to specify ascending=False
when sorting:
df = df.sort_values('email', ascending=False)\
.drop_duplicates(['fname', 'lname'])
print(df)
fname lname email
4 Bill Smith [email protected]
1 Joe Aaron [email protected]
Upvotes: 1
Reputation: 323226
You can using first
with groupby
(Notice replace empty with np.nan, since the first
will return the first not null value for each columns)
df.replace('',np.nan).groupby(['fname','lname']).first().reset_index()
Out[20]:
fname lname email
0 Bill Smith [email protected]
1 Joe Aaron [email protected]
Upvotes: 0