Remove duplicate rows upon multiple conditions

Question

I have a data frame with duplicate rows. Some columns have data other have zeros. I want to retain the rows with data and eliminate those with zeros, but I also want to remove duplicates when there is only zeros. Here is an example

df =

A   B
x   0
x   1
y   0
y   0
z   1

I want it to be

A   B
x   1
y   0
z   1

I am struggling to drop duplicates hierarchically; first keeping the nonzero rows where columns values are different from zero, and then removing duplicates of the rows with column values are zero.

thank you

Scott Boston · Accepted Answer

Use sort_values and parameter keep='last':

df.sort_values(['A','B']).drop_duplicates(subset='A', keep='last')

Output:

Update for negative values:

df.assign(sortkey = df.B.ne(0))
  .sort_values(['A','sortkey'])
  .drop_duplicates(subset='A', keep='last')

Remove duplicate rows upon multiple conditions

Answers (2)

Related Questions