Kor
Kor

Reputation: 63

Dropping duplicates based on other column values (Python)

I have a dataframe with 3 columns. I would like to drop duplicates in column A based on values in other columns. I have searched tirelessly and cant find a solution like this.

example:

A B C
Family1 nan nan
Family1 nan 1234
Family1 1245 nan
Family1 3456 78787
Family2 nan nan
Family3 nan nan

Basically i want to drop a duplicate ONLY IF the rest of the columns are both nan. otherwise, the duplicate can stay.

desired output:

A B C
Family1 nan 1234
Family1 1245 nan
Family1 3456 78787
Family2 nan nan
Family3 nan nan

Family2 and Family3 remain in the df because they dont have duplicates, even though both columns are nan

Upvotes: 1

Views: 71

Answers (2)

Umar.H
Umar.H

Reputation: 23099

try a double boolean, this returns true for all duplicates & true for any column after ['A'] that are all nulls. If both conditions are met we will exclude this using the ~ operator which inverts a boolean.

df[~(df.duplicated(subset=['A'],keep=False) & df.iloc[:,1:].isna().all(1))]

          A     B        C
1  Family1    NaN     1234
2  Family1   1245      NaN
3  Family1   3456    78787
4  Family2    NaN      NaN
5  Family3    NaN      NaN

Upvotes: 3

wwnde
wwnde

Reputation: 26686

You were not very clear. I suspect you want to drop any duplicates in column A if both columns B and C are NaN. If so, please try;

df[~(df.A.duplicated(keep=False)&(df.B.isna()&df.C.isna()))]

Upvotes: 3

Related Questions