Tatsuya
Tatsuya

Reputation: 127

How to drop duplicates from a subset of rows in a pandas dataframe?

I have a dataframe like this:

A   B       C
12  true    1
12  true    1
3   nan     2
3   nan     3

I would like to drop all rows where the value of column A is duplicate but only if the value of column B is 'true'.

The resulting dataframe I have in mind is:

A   B       C
12  true    1
3   nan     2
3   nan     3

I tried using: df.loc[df['B']=='true'].drop_duplicates('A', inplace=True, keep='first') but it doesn't seem to work.

Thanks for your help!

Upvotes: 10

Views: 2863

Answers (2)

piRSquared
piRSquared

Reputation: 294218

df[df.B.ne(True) | ~df.A.duplicated()]

    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3

Upvotes: 5

BENY
BENY

Reputation: 323226

You can sue pd.concat split the df by B

df=pd.concat([df.loc[df.B!=True],df.loc[df.B==True].drop_duplicates(['A'],keep='first')]).sort_index()
df

Out[1593]: 
    A     B  C
0  12  True  1
2   3   NaN  2
3   3   NaN  3

Upvotes: 11

Related Questions